0
Hello, these days I woke up wanting to learn about voice recognition, with a brief research I found on the MFCC
, so I decided to study and found this material through a google search:
- http://aquarius.ime.eb.br/~apolin/papers/Carlos_uff_2007.pdf
- http://www.practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/#eqn1
In these researches I understood the following:
1 - First we need to calculate the FFT (Fourier Fast Transform) of a signal to obtain the frequencies of that signal (which in this case is the sound).
2 - Apply the pre-press filter to eliminate frequency instability.
3 - Do signal windowing, separating the voice signal into small parts that can be from 20 to 40 ms.
4 - Calculates the MFCC
of each sign window.
I don’t know if I’m right about the above steps, if I’m right.
About the calculations, my doubts are about the values used and the meaning of the variables:
- Pre-enfase filter
H(z) = 1 − az^-1, 0.9 ≤ a ≤ 1.0
What is z
and what is a
? and why a
has to be between 0.9
and 1.0
?
- Janelamento
h(n) = 0.54 − 0.46cos(2 . Pi . n / N - 1)
What is n
? What I understood about N
is that it is the total number of samples, again correct me if I am wrong.
- MFCC
N/2 P(i) = Σ |S(k,m)|²Hi(k.(2Pi/N)) k=0
Here I admit that I understood absolutely nothing, if you can give me a good explanation I thank you.
And one last doubt (must have much more, but I’m not remembering now):
If I understand well the result of the calculation of MFCC
is a value vector, so for recognition I just have to calculate the Mfccs of two signals and compare these vectors?
I am very layman in physics and not very good with calculations so relegate if I’m wrong or if my doubts are too layy. `
More questions: 1 - do I have to take the FFT out of each window and go through the equation? 2 - Could you explain me a little more about the second part of the equation?
– AlexsanderSS
@Alexsanderss yes you have to apply FFT in each frame and use the equations, I gave an incremented to answer your question, check again
– ederwander