What is needed for OCR Android

Asked

Viewed 2,544 times

6

I’ve seen several articles and questions on forums and many other sites on the internet, I know what is necessary for a basic ocr, I’ve managed to do one and so but what I’m going to ask here is more specific with regard to the subject. For an android OCR is required. - Camera - OCR API (Tesseract for example)

But here’s what I’d like to know. 1 - When pointing to text when the camera can focus capture the image and analyze with the OCR API to locate the text of that moment without having to take a photo, save and analyze the JPEG.

2 - How to search in the image captures specific words.

3 - Put some artwork on the screen like dots around the letters like I’ve seen in some other OCR apps.

I know it can be complicated, well, for me it is enough, but if there is one that can give some light, some direction, obviously does not need a solution ready but which classes of android I would maybe use for that then I study them.

  • 2

    Can you provide more details of your intended use? I ask this because if you need to make a system that in fact recognize the text (that is, extract the image from the string for some other later manipulation), its only path is even OCR. But if you just want to recognize a visual element (to mark it in the image, put art around, etc., etc.), you don’t necessarily need OCR. An alternative is to use a Cascade detector. (continue...)

  • 2

    The Opencv, for example, has port for Android and already has a great implementation of this detector. You can learn from this tutorial how to train you to detect anything: http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html

1 answer

5

You’ll be completely plastered if you use OCR Apis ready, you won’t be able to highlight much less put dots around a specific word, nothing in this sense will be possible, an OCR has only the function of trying to extract the letters from an image and return in text mode.

As commented by @Luizvieira o Opencv will be your right hand for this type of project, you can really train each letter and alphabet number to make real-time comparisons, this workout has to have an invariant scale or it is no matter the font size, no matter the scale, even so he will have to know which letter it is.

I can give you the basic steps of how this can be done using the OpenCV to extract the pixels and without using the OpenCV to train

  • Create vectors with the patterns of all letters and numbers, you will need to crop each letter and number, extract the pixels of each, use the OpenCV it has functions ready for pixel extraction, store in the way you find convenient.

  • Now you have the basis for comparison, you will want to compare each real-time captured letter with extracted patterns, use the OpenCV to cut out each letter of their texts in real time, such as know where each letter starts and ends as you point the camera of your mobile phone ? This algorithm can be done by comparing horizontally each pixel until you find the beginning and end of each pixel letter, we are talking about something basic here, 99% of the texts are in black with white background (it is super important to define what color is predominant text background, you can achieve this by writing a RGB histogram), or simply force everything to be black and white which is a really great idea, let’s focus on the white background for character example, walk until the white pixel end mark the position, in this point will start the new pixel (black in this case), walk up to the black pixel end mark position, it will tell you where to crop each letter or number (beginning and end), you have just segment(separate) letters in real time.

  • Perfect cut the letter from the text, now extract the pixels from it, just as in the first step done to build your bank.
  • Now compare what was extracted from the text with your database, in linear algebra has a concept called linear space, in this case we’ll have which pixels appear most often, it’s a way simple that can be used for measural which letter is more similar.
  • Assemble each word based on this rank (the higher the cosine returned by linear space better) and surprise if that word is a específica you will have the entire position (start, end) and you can use the OpenCV again to insert some desired art since now you know the exact position of it within the text.

I just described a simple way to create a OCR, unused escala invariante, instead of using espaço linear you can train each letter and number using the OpenCV, There is the function SURF in the OpenCVthat applies invariant scale and is faster than its predecessor SIFT, the basic bulk of how everything works is this ai.

  • Ederwander, could you send me your email so we can talk?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.