Decoder for Open Source CAPTCHA

Asked

Viewed 12,377 times

4

I am looking for an API for decoding CAPTCHA, one that is free to use and open source.

I understand that this is a complex process that uses OCR scans and advanced techniques of analysis and digital processing of images, even so I believe it is something interesting to study these mechanisms behind decoding.

So I looked for some references on the web, they were:

libautocaptcha

I found the libautocaptcha, but without success in the use. After downloading the required source code and libraries it presents errors due to lack of classes.

Jdownloader

A much cited application in international forums is the Jdonwloader, that internally has an implementation (Janticaptcha) that decodes CAPTCHA from major file sharing sites.

Tesseract OCR Tesseract

This is a powerful OCR that is also cited in forums as a good option to scan the CAPTCHA and decommission it.

Given all this, does anyone have experience with any other API for decoding or one of these above? Could offer some functional example that resulted from this experience?

2 answers

4

Captcha breaking routines are usually made for a specific captcha. Usually you should treat the image before trying to read it with an OCR, trying to make the letters black on a white background. One I recommend is Tesseract, which you mentioned yourself.

I don’t think there’s a generic algorithm for that.

  • 4

    How great, an unexplained downvote!

  • They say that there is a single system that could decode several Ptchas... is the system of recognition of numbers of the residences that google uses in google maps... I imagine that at least it should be good as they say.

  • @Fláviogranato I never heard... you would have an address or reference on this system?

  • It’s just a comment, of course you’ve heard is the google maps that everyone knows, or not? In it there is an internal system to recognize the numbers through which google cars cameras pass. http://info.abril.com.br/noticias/internet/2014/04/algoritmo-do-google-desvenda-captchas-com-quase-100-de-acerto.shtml

  • @Fláviogranato Tranquilo! I know Google Maps... what I wanted was information about their OCR system: if it’s public, open-source, if there’s any way to use it?

  • Only they to answer you accurately.

Show 1 more comment

1

It is possible yes, theoretically, to build a generic software that is trained to solve any captcha and I believe that in a few years we will have this available. In my company, Infosimples, we have achieved incredible results in similar problems using Deep Learning, technology in which we are experts.

The article published by Google in the ICLR14 regarding how they Automatized the digit recognition in house numbers can be found in this link:

http://arxiv.org/pdf/1312.6082.pdf

The presentation of the article in ICLR14 can be seen in this video:

https://www.youtube.com/watch?v=vGPI_JvLoN0

They applied the same solution to reCaptcha using a base with a few million examples and managed to solve about 100,000 new Chas with a 99.8% hit rate (better than a human in the same activity).

The essence of the solution is to train databases (with millions of examples) in a very deep neural network (with many layers, convolutions and billions of connections between neurons).

Unfortunately, the technology to achieve results like the above is still relatively restricted, difficult to use and very expensive.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.