This question already had an answer accepted, I marked to answer and I ended up not having time, but it’s never too late to add and make new considerations.
I wanted to know if there is some kind of system or if it is possible to create a
system that can do voice recognition (whether for login, or
any commands), using php.
Your question was open you wanted to ask about a system that converts what is spoken into text (transcription) or do you want a system that the user uses the voice to previously record a word and your system will be based on the voice/word spoken by the user to compare and validate? They are two completely different systems.
Of course the First type of system is complex but the second I venture to say that it is "easy" with a few lines of code in Matlab I can rank and qualify how similar is a pre-recorded word with a new.
I don’t know exactly dates but since the 80’s is used the MFCC - Mel Frequency Cepstral Coeficientes
to find speech patterns, we are talking over 30 years and this technique is still considered the state of the art for this type of recognition (find pre-recorded words from a given announcer).
To clarify a little the MFCC
is derived from the Cepstrum
:
cepstrum = IFFT(log(FFT(s)))
What this equation means ?
It returns an envelope/formants (contour) of the frequencies of a signal in the frequency domain, this tells us consistently the shape of the vocal tract in the spectrum envelope.
Therefore the difference between MFCC with Cepstrum is the frequency bands equally spaced on the scale honey, which approaches the response of the human auditory system more narrowly than the linearly spaced frequency bands used in normal Epstrum.
OK we have a way to capture the waveform of any word in the spectrum and then how do we compare this?
We will appeal to a deterministic method (methods that do not give any special treatment to the noise present in the data, and if it is expected that these data are actually contaminated in any way), this means that you will need to buy "something" pre-recorded with "something" new whether in good (noiseless) or bad (noisy) conditions and yet being able to determine how similar they are, seems complex, but not so much, we can use DTW - Dynamic Time Warping to compare two vectors with the information of the coefficients returned by the MFCC and take an action.
The method described here was widely used in cell phones in the 90s, in the function where you associated a contact with a pre-recorded voice type (Fernando) you spoke on the microphone "Fernando" and he called your contact.
About making this system in PHP language technically is possible yes, it can be more complicated by not having native functions for Fourier transform and neither recording with audio Encode and Decode.
http://www.speechapi.com/ and http://voicephp.com/ and http://cmusphinx.sourceforge.net/
– Diego Souza
Related : http://answall.com/questions/101980/compatibilidade-do-google-speech-api
– Gabriel Rodrigues
Gabriel, in this link you posted has nothing php bro. I had even seen it before, but it did not clear my doubt.
– DiChrist
It is possible to do it in php, but not feasible. At most, the PHP function will be in the login part after receiving the audio data. To receive the audio you need to capture and decode it. PHP will be able to decode and make comparisons but to do so will need some extension written in C, JAVA, VB or any other more suitable language.
– Daniel Omine
then php alone could not do?
– DiChrist
@Dichrist will need at least Javascript pro microphone and Ajax with File API to send audio to PHP, or it will be much more than PHP. But it is likely that the yes recognition can be done with PHP after the audio sent.
– Guilherme Nascimento