Decoder for CAPTCHA

Asked

Viewed 3,110 times

6

I need to turn a captcha into text, being specific, to perform the download of NFE on the site Receita Federal, and for this I have read about the OCR however, it does not seem to be 100% guaranteed, someone already had this need and/ or can share the path of stones to be able to translate this captcha?

  • 4

    Most likely this should be done directly by the Nfe api itself. If the note involves a vendor, it is correct to import from XML. If it’s yours or your client’s, you have the data. If it’s third-party, the api lets you see if it’s valid. For the rest, if there is no involvement with either party, captcha is precisely to avoid third party data scanning bots.

  • 1

    Hello Bacco, I fit in the bots option, I am neither the sender nor the recipient, so I can not use the service of SEFAZ, I do the freight and to prevent my team have to enter all the information of NFE manually want to download, because, the volume is huge.

  • Whoa, look at the active issues: decoder-to-captcha-code-open

  • An alternative would be to import the captcha into your application so your team would only type the captcha. Now, I think the client would send you XML in a good, no?

  • Many clients do not even know what is xml, however, some already rule, but are the minority.. Thanks Bacco

1 answer

9

I’m sure a lot of good answers will come, but I’ll leave my one penny on the question.

In my research I saw a lot about the Tesseract from Google which is one of the most efficient OCR codes there is. But I didn’t develop anything, it was just research.

I did some tests, but without a customization for the specific purpose, which is the resolution of the captcha Recipe, you will have many errors for very few hits.

The latest version of Tesseract is the 3.02.02, so far. Until version 2.32, if I’m not mistaken, it was possible to wrapper the Tesseract library which is written in C++ for C and so use more easily by other languages. I think today, like the Tesseractenginewrapper for . Net, it is easier to try to find something that already does her Wrapper for Java, which is your case.

And I came to see also online services that offer to make the decaptcha for you, as the captchabot.com and the deathbycaptcha.com. But I didn’t test them either.

Ah a certain discussion about whether this is cool or not.

There’s a guy who implemented something with these services, he sends the captcha to the API of one of these sites, the site does the decaptcha for it and returns the text, and then he performs the access and works the HTML, see here.

His blog is: http://fsist.blogspot.com.br. Good, but it’s just so you know what’s already been done.

But I would like to encourage you to search by download of Nfe, the XML itself, directly by the Webservice of the Internal Revenue Service, just need to have the certificate.

  • 1

    Thank you Tiago, the download from WS of Receita has a restriction that I can only download if I am the recipient of NF, which I am not. We have a volume of more than 1,000 where a team is typing the data if I can perform the download we will no longer need so much work... Thank you. Fernando

  • Very interesting this google API, I will evaluate, thank you very much.

  • 2

    I think the most suitable solution for you right now is like the one @Bacco gave you. That is, your team inform the captcha and then you work the HTML. Who knows how long you will invest in looking for a captcha solution, unless you decide to invest money in it and pay for a service or buy something ready. Because it is as I said in my reply, without a fitting you will not be able to effect the OCR process in the image easily.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.