You’ll be completely plastered if you use OCR Apis ready, you won’t be able to highlight much less put dots around a specific word, nothing in this sense will be possible, an OCR has only the function of trying to extract the letters from an image and return in text mode.
As commented by @Luizvieira o Opencv will be your right hand for this type of project, you can really train each letter and alphabet number to make real-time comparisons, this workout has to have an invariant scale or it is no matter the font size, no matter the scale, even so he will have to know which letter it is.
I can give you the basic steps of how this can be done using the OpenCV
to extract the pixels and without using the OpenCV
to train
Create vectors with the patterns of all letters and numbers, you will
need to crop each letter and number, extract the pixels of each,
use the OpenCV
it has functions ready for pixel extraction, store in the way you find convenient.
Now you have the basis for comparison, you will want to compare each
real-time captured letter with extracted patterns, use the
OpenCV
to cut out each letter of their texts in real time, such as
know where each letter starts and ends as you point the camera
of your mobile phone ? This algorithm can be done by comparing
horizontally each pixel until you find the beginning and end of each pixel
letter, we are talking about something basic here, 99% of the texts are in
black with white background (it is super important to define what color is
predominant text background, you can achieve this by writing
a RGB histogram), or simply force everything to be black and white which is a really great idea, let’s focus on the white background for character
example, walk until the white pixel end mark the position, in this
point will start the new pixel (black in this case), walk up to the
black pixel end mark position, it will tell you where to crop
each letter or number (beginning and end), you have just
segment(separate) letters in real time.
- Perfect cut the letter from the text, now extract the pixels from it,
just as in the first step done to build your bank.
- Now compare what was extracted from the text with your database,
in linear algebra has a concept called linear space, in this case
we’ll have which pixels appear most often, it’s a way
simple that can be used for measural which letter is more
similar.
- Assemble each word based on this rank (the higher the cosine
returned by linear space better) and surprise if that word is a
específica
you will have the entire position (start, end) and you can use the OpenCV
again to insert some desired art since now you know the exact position of it within the text.
I just described a simple way to create a OCR
, unused escala invariante
, instead of using espaço linear
you can train each letter and number using the OpenCV
, There is the function SURF in the OpenCV
that applies invariant scale and is faster than its predecessor SIFT, the basic bulk of how everything works is this ai.
Can you provide more details of your intended use? I ask this because if you need to make a system that in fact recognize the text (that is, extract the image from the string for some other later manipulation), its only path is even OCR. But if you just want to recognize a visual element (to mark it in the image, put art around, etc., etc.), you don’t necessarily need OCR. An alternative is to use a Cascade detector. (continue...)
– Luiz Vieira
The Opencv, for example, has port for Android and already has a great implementation of this detector. You can learn from this tutorial how to train you to detect anything: http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
– Luiz Vieira