Voice recognition in php

Asked

Viewed 2,360 times

11

This question is of curiosity indeed, I am not touching anything of the type, but in the future who knows.

I wanted to know if there is any type of system or if it is possible to create a system that can do voice recognition (either for login, or any commands), using php.

If it’s not possible, I’d like to understand because can’t do something like this in php.

I find the voice feature fascinating, and helpful, when you have more ownership and maturity on the subject, in working on a system or api like.

  • 1

    http://www.speechapi.com/ and http://voicephp.com/ and http://cmusphinx.sourceforge.net/

  • Related : http://answall.com/questions/101980/compatibilidade-do-google-speech-api

  • 1

    Gabriel, in this link you posted has nothing php bro. I had even seen it before, but it did not clear my doubt.

  • 1

    It is possible to do it in php, but not feasible. At most, the PHP function will be in the login part after receiving the audio data. To receive the audio you need to capture and decode it. PHP will be able to decode and make comparisons but to do so will need some extension written in C, JAVA, VB or any other more suitable language.

  • then php alone could not do?

  • @Dichrist will need at least Javascript pro microphone and Ajax with File API to send audio to PHP, or it will be much more than PHP. But it is likely that the yes recognition can be done with PHP after the audio sent.

Show 1 more comment

3 answers

7


The "state of the art" of voice recognition is very advanced, so implementing such a system from scratch would be a huge job, would require a lot of research and several people working for months or years. It makes more sense to call a third-party API (which already solved the problem), for example https://wit.ai

That said, disregarding performance/efficiency, there is in theory nothing to prevent someone from implementing a "manually" voice recognition system in PHP (or any language). You would need to use (or develop) a library to read the audio file and return a stream, and then pass that stream through some kind of processing to recognize the speech.

There are several techniques for voice recognition (hidden models of Markov, DTW, Neural Networks, etc). This wikipedia link (in English) has some information on the subject:

https://en.wikipedia.org/wiki/Speech_recognition#Models.2C_methods.2C_and_algorithms

Here is Wit.ai’s HTTP API documentation, which I mentioned above:

https://wit.ai/docs/http/20141022

The examples are in bash (using Curl), so you would have to rewrite them in PHP.

  • I will delve into the subject and know this tool. In fact I had in mind that trying to use something like this would be a huge job. I will seek to know more this tool or other that can help in this sense.

4

This question already had an answer accepted, I marked to answer and I ended up not having time, but it’s never too late to add and make new considerations.

I wanted to know if there is some kind of system or if it is possible to create a system that can do voice recognition (whether for login, or any commands), using php.

Your question was open you wanted to ask about a system that converts what is spoken into text (transcription) or do you want a system that the user uses the voice to previously record a word and your system will be based on the voice/word spoken by the user to compare and validate? They are two completely different systems.

Of course the First type of system is complex but the second I venture to say that it is "easy" with a few lines of code in Matlab I can rank and qualify how similar is a pre-recorded word with a new.

I don’t know exactly dates but since the 80’s is used the MFCC - Mel Frequency Cepstral Coeficientes to find speech patterns, we are talking over 30 years and this technique is still considered the state of the art for this type of recognition (find pre-recorded words from a given announcer).

To clarify a little the MFCC is derived from the Cepstrum:

cepstrum = IFFT(log(FFT(s)))

What this equation means ?

It returns an envelope/formants (contour) of the frequencies of a signal in the frequency domain, this tells us consistently the shape of the vocal tract in the spectrum envelope.

Therefore the difference between MFCC with Cepstrum is the frequency bands equally spaced on the scale honey, which approaches the response of the human auditory system more narrowly than the linearly spaced frequency bands used in normal Epstrum.

OK we have a way to capture the waveform of any word in the spectrum and then how do we compare this?

We will appeal to a deterministic method (methods that do not give any special treatment to the noise present in the data, and if it is expected that these data are actually contaminated in any way), this means that you will need to buy "something" pre-recorded with "something" new whether in good (noiseless) or bad (noisy) conditions and yet being able to determine how similar they are, seems complex, but not so much, we can use DTW - Dynamic Time Warping to compare two vectors with the information of the coefficients returned by the MFCC and take an action.

The method described here was widely used in cell phones in the 90s, in the function where you associated a contact with a pre-recorded voice type (Fernando) you spoke on the microphone "Fernando" and he called your contact.

About making this system in PHP language technically is possible yes, it can be more complicated by not having native functions for Fourier transform and neither recording with audio Encode and Decode.

3

Assuming you already understand the enormous complexity of doing this in any language, better handle the things that make it possible or impossible in PHP.

You can divide each program into two parts: logic and I/O. I/O means that it communicates with the program to something different: input and output. Common types of I/O are write and read disk, input data by the user, display screen information to the user etc. In the case PHP, even if not the fastest or most beautiful language, is well able to do every calculation thing necessary for voice recognition. Turning systems more capable this less and less will be a problem. But in the I/O part you need to look for a way to get the user’s voice data. It is possible to use PHP on the computer without functioning as a website, but I suppose in normal use it would not do this. Then you will have to deal with the internet means to get the data you need, which is the sound of the voice.

Previously this was completely impossible: there was no voice input system in the browser that I could use, except maybe Flash and Java applets. But now you have Webrtc. You’ll need to use a little Javascript, but it’s possible: you ask Javascript for access to the microphone, and you pass this information to the backend that uses PHP code for voice recognition to return information to the user. If you have library for voice recognition on the system you can even access that from PHP and save you a lot of work.

So yes, it is possible, but because of the inevitable need to have a type of I/O it cannot be pure PHP. But to be honest, who has already written pure PHP, has to produce HTML even for the user to be able to do anything useful?

  • quem já escreveu PHP puro, se tem que produzir HTML: i write "pure php without html" scripts weekly to do various things like import an external database, export my database, crawl some webpage (huhg, Crawler in php sucks)etc. You also have a multitude of php projects that are run locally: https://github.com/squizlabs/PHP_CodeSniffer, https://github.com/phpmd/phpmd, https://github.com/rlerdorf/phan, https://getcomposer.org/.

  • I know these things, we have it here also in the company, but all this is usually still on site maintenance function.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.