How does a CAPTCHA work?

Asked

Viewed 1,699 times

23

I understand that a CAPTCHA is a way for me to ensure that the user who interacts with my system is a human being and not a script.

But this is the simple explanation that we give to laypeople. How do CAPTCHA really work, and what strategies do they use? It would be possible to have a simple example of code to demonstrate the concept?

1 answer

15


CAPTCHA MEANS Completely Automated Public Turing test to Tell Computers and Humans Apart, or fully automated public testing to differentiate computers from humans.

In general, CAPTCHAS are made so that they are easy to be solved by a human being and difficult for computers. The program that displays captcha usually already knows the correct answer and only confirms if what the user answered is correct. There are several types of CAPTCHAS

  • Text: usually some random letters and some noise is added (such as straight, or dots). They are presented to the user in the form of an image.

  • Audio: are usually used in conjunction with text caps and perform primarily accessibility function for visually impaired users. Are sounds with some noise included.

  • Imagery: more recent. Some images are shown to the user and the program asks him to select those of some category.

Anyone can create a captcha: as long as it is a fully automated test.

Here comes a generator made in R:

library(magick)
library(magrittr)
gerar_captcha <- function(base_img){

  letras <- sample(letters, 6, replace = TRUE) %>%
    paste0(collapse = "")

  cap <- base_img %>%
    image_annotate(
      letras, 
      size = sample(30:70, 1),
      degrees = sample(1:60, 1),
      color = sample(c("green", "blue", "red"), 1),
      location = paste0("+", sample(20:100, 1), "+", sample(20:100, 1))
    )

  list(
    letras = letras,
    cap = cap
  )
}

This code generates this type of boots, with random position, inclination and color. In addition to random letters.

inserir a descrição da imagem aqui

However, text booths are doomed to failure. It doesn’t take much work to break them today, mainly using machine learning techniques. I have a project to break public utilities (which do not offer API) and with convolutional neural networks we are reaching more than 99% of hit on various types of Caps: https://github.com/decryptr/decryptr

Therefore, recently companies are developing several other ways to verify if the user is human. The most widely used solution today is Google’s reCaptcha, which, amazingly enough, only asks you to click a button. This hood analyzes various information from your browsing and the way you click the button to tell whether you are human or computer, and it is far harder to break than the text cap.

An interesting story about Captchas was its use for creating correctly labeled image databases and for transcription of books. The first versions of reCaptcha were as follows:

inserir a descrição da imagem aqui

A scanned word from a book (which the captcha provider himself did not know the answer) and a generated word (which the program knew answer) were presented. With the response of the users, the program was able to identify and transcribe written words in books that had been digitized. Some more modern versions also help Google identify house numbers in Street View images:

inserir a descrição da imagem aqui

So when you’re answering Ptchas, you might be helping google improve Google Maps.

  • A note regarding Google using reCaptcha to better your Maps, it is further innovating in the tool, now most of the time it is asked to choose pieces of images that contain a main key, for example cars, or plates. A second way still, is an image editing tool that works to circumvent a part of the image that is requested. It really took me by surprise this Monday.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.