Is there an audio transcription Api that can be used in PHP, C library or Java?

Asked

Viewed 6,101 times

6

I’m looking for an api that you can get when you receive the audio, try to recognize the text. Does anyone know any opensource api for this? The opposite (receiving a text and generating an audio has several). The intention is to install on the local server (linux) and use together with PHP. If there is such a thing?

Initial goal is to employ an anti-captcha tool, in general today has many sites that besides having the image has the audio option, once having the audio, I can send to an api of this and submit the captcha, thus becoming easier to consume services, as consultation and validation of CPF, CNPJ and etc...

Today I can use CURL to capture normally the HTML, do the anti-captcha for images bumps into many issues where not always the algorithm is efficient to break the captcha and yet the need to develop algorithms for different images.

In the searches I did found a lot to generate audio from text, but otherwise generate text from audio I only found in closed applications where I would need to use the application along with hotkey to resolve the issue. As you will be running on a web server I did not find a good solution.

I found Googlespeech, get on the page http://www.edivaldobrito.com.br/reconhecimento-de-voz-no-linux-utilizando-google-speech-api-google2ubuntu/, However, I did not implement it, because when I started reading I saw that it would work by the hotkey and the microphone. This would undoubtedly be one of the possible implementations in the worst case and if it didn’t have an easier way to use. " When you want to trigger the Google2ubuntu voice recognition system, press the keyboard shortcut you have set up. By pressing the keyboard shortcut..."

  • 2

    Have you seen if Googlespeech doesn’t have an extension to do that? It works like this: http://www.edivaldobrito.com.br/recognition-de-voz-no-linux-utilizando-google-speech-api-google2ubuntu/

  • Could the person who voted to close explain the reason? I believe it is an objective question with a right or wrong answer within the scope SO-pt

  • The people carry a lot of iron and fire that the Help Center sends. Try editing your question with more information, such as the purpose of the question, which you have tried...

  • @Brunoaugusto, in the searches I did found a lot to generate audio from text, but otherwise generate text from audio I only found in closed applications where I would need to use the application along with hotkey to resolve the issue. As you will be running on a web server I did not find a good solution. And I decided to post the question here, to see if the staff helped, giving a direction as the erderwander did in his answer.

  • @Dante came to see, fell exactly on this page, however, I did not implement, because when I started reading I saw that it would work by the hotkey and the microphone. This would undoubtedly be one of the possible implementations in the worst case and if it didn’t have an easier way to use. " When you want to trigger the Google2ubuntu voice recognition system, press the keyboard shortcut you have set up. When pressing the keyboard shortcut...", however, there is how to use via google api, but this alternative I did not know.

  • @Rodrigoborth, as to close the topic here is already normal, somesn expect that to be a programming question has to have a code snippet, only that is not always the case, sometimes it is a conceptual issue, but goes from the interpretation of each vote to close, Sometimes more time and effort is lost debating that based on determining from point of view the issue is out of scope, than trying to give a solution and leave something useful not only to those who asked the question, but to other Internet users. Even because closing only prevents new answers, don’t stress about these things.

  • 2

    This question is being discussed at the Meta. . . . PS: Sileno, instead of writing so much here in the comments, it would be better to edit the question...

  • @brasofilo The question has been edited, in general I usually first answer the comment and then edit the question to make it more complete or easy to understand, you may notice that there are already some issues in this question, however (was not the case of this question) has comments that I do not think fit editions in this case I do not edit, but I believe that who spent a while leaving the comment, deserves at least a reply, for this reason I leave so many comments.

Show 3 more comments

2 answers

5


You are looking for ASR (Automatic Speech recognition).

Open source is very complicated to find, these algorithms have a very large commercial appeal, have some very old designs and I think only support English transcription:

Sphinx

freespeech

I already tested the verbio is not opensource, but you can install a demo of it and leave running in Evaluation mode has support for Portuguese-BR language.

I wouldn’t think twice about using googlespeech, with the help of curl you must assemble the appropriate header send the audio file to google and pick up the transcript, first you must convert the audio file to flac format and do resample to 8000hz, after these procedures just send the file to google, in php you will do something like this:

$file_to_upload = array('myfile'=>'@'.$filename.'.flac');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.google.com/speech-api/v2/recognize?output=json&lang=pt-BR&key=___my_api_key___");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: audio/x-flac; rate=8000"));
curl_setopt($ch, CURLOPT_POSTFIELDS, $file_to_upload);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result=curl_exec ($ch);

I do it with python and it works like a beauty :-)

  • Your answer is excellent, I’ll try this, I had made some attempts, but the applications I found, had to use their graphical interface. I didn’t google, I was just using it to create audio from texts.

  • Dude, just access the googlespeech API using Curl, I didn’t search, but you will find code in php ready that do just that, Ahh just to enrich, create audio from texts is called TTS (Text To Speech)

  • This link also has some interesting information: http://stt.getflourish.com/

  • yes, it explains how to get the key for the second version of googlespeech API(current), unfortunately it seems to me that only 50 daily requests per generated key are allowed, as far as I know it is now allowed to upload files. wav at 16 bits, ie no longer need to convert to . flac

  • give a look here (https://github.com/gillesdemey/google-speech-v2) have examples of how to send audio via command line to linux and get back only using Curl

  • Until conversion using Sox is not too problematic, but it is good as it is code and dependency less.

  • Cool, I used this: Audacity for a while, at the time I was trying to make a converter to compress the audio in g729 format, gave some problems, that I do not remember what they were and in the end another project of more priority occupied the place of the conversion and project of the g729 was never resumed.

Show 2 more comments

0

  • 1

    Ivo, can you explain a little more how this API works and maybe [Edit] the answer with an example or two?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.