What is the data obtained by scipy.io.wavfile read?

Asked

Viewed 1,014 times

3

Hello, I have been using scipy.io.wavfile.read to open the audio files made available on: https://en.wikipedia.org/wiki/Special_information_tones. I convert the files to wav using VLC. In theory the tones should be between the frequencies 913 and 1776. But when reading the file, the data returned goes up to over 6000 Hz. So my curiosity is:

  1. Am I interpreting the returned data correctly? They are 16bit integers (2 bytes).
  2. There’s another way to get those frequencies?

I’ve posted another question showing how I wanted to use this, but I’m thinking I’m misinterpreting the data. (SIT tones using Python or C#)

[EDIT] This is the draft code I’m using to test: http://pastebin.com/yvng2VE8

[EDIT] I want to check if the tones occurred in files like this: Recording.wav. For this I am using these two files below as proof first, to validate the concept and the detector and then try to run it in the file above. IC_SIT.wav and RO'_SIT.wav

Thanks for the help.

  • 1

    Can you edit the question and put the code you are using to read the file and find the 6000 Hz frequency? Maybe it’ll be easier to help

  • 2

    What algorithm are you using to find the frequencies ?

  • 2

    Put the audio file . wav online somewhere for me to listen and analyze as well.

  • Hello, @ederwander, I want to check if the tones occurred in files like this: https://dl.dropboxusercontent.com/u/106738286/recording.wav For this, I am using these two files below as proof first, to validate the concept and detector and then try to run it in the file above. https://dl.dropboxusercontent.com/u/106738286/IC_SIT.wav https://dl.dropboxusercontent.com/u/106738286/RO'_SIT.wav

  • 1

    I did time domain and frequency analysis on the RO' SIT.wav file, take a look at my response.

2 answers

4


Well I’ll try to explain some concepts without getting into the deep math involved.

First scipy.io.wavfile.read will return you the amplitude of the signal in short int, you can manipulate these values to change the volume of your audio, mute, increase, decrease. The amplitude value should be used by you to find your frequencies.

@jsbueno gave you a good approach, I have some considerations on the definition of frequency.

The number of comings and goings of this speaker membrane per second is which represents the frequency: ie - each time these numbers go from a lighter series to a descendant and then to a Igniting back, a "cycle" is counted - if that descent and rise of the numbers takes 441 numbers to happen in a sampled wave in 44100 Herts, that means the frequency at that point is 100 hertz (in 1 second, you will have 100 wave peaks - or cycles).

Well I understood what you meant but got a little confused, define frequencies as being peaks, climbs and descents is too simplistic, this may be true for pure signals without interferences (sinusoidals) what is rare in the real world, senoids sent as tones during a connection suffer noises, attenuations, operators love to put on purpose a white noise to give a sense that the line does not change, etc,etc,etc,etc. Frequency is synonymous with periodicity, think of periodicity as an interval of time when a certain occurrence repeats;

With the time just to look at the Plot of the amplitude of a signal you will be able to almost instantaneously set the frequency of this signal in the time domain, which can be computationally complex in some cases. an example of a 100hz senoid sampled at 44100hz.

inserir a descrição da imagem aqui

There is a pure tone without interference, it is easy to visualize where a period repeats from the point x,y=0 I marked the shaft x exactly in position 441 which is where a cycle = period repeats, this gives you 100hz of frequency for this signal 44100/441=100hz

Well let’s complicate a little, how about get 100hz without being a senoid, I went on this site and I got this audio file here

Take a look as it is already more complex to find where a period repeats:

inserir a descrição da imagem aqui

The audio was sampled at 48000hz, I marked the naked eye that on the axis x 477 is where a period is repeated 48000/477=±100hz to be exact = 100.6289hz

This will give you an idea of how things can get complicated when a signal comes full of noises and attenuations, note that when a period is repeated it does not need to be exactly the same, in the real world it will never be due to interferences, OK @jsbueno told you about the crossing of 0 on the X axis (zero Crossing), in fact this is a method of achieving frequencies and it’s really quite rudimentary, the smartest ways to find frequencies in the time domain involve selfcorrelation gives a read on techniques like AMDF (Average Magnitude Difference Function).

Just out of curiosity I took the example "RO' SIT", let’s do an analysis in the audio time domain, according to the wiki these frequencies need to be (low) 913.8 Hz (high) 1428.5 Hz (low) 1776.7 Hz. There’s the audio Plot:

inserir a descrição da imagem aqui

The first thing I noticed is that there is a fade in/fade out concatenating each frequency, this was done so that there are no audio crashes between the frequency transitions, let’s zoom in on the first 1024 samples and try to find the frequency:

inserir a descrição da imagem aqui

Now you can see clearly the fade-in (gradual appearance effect of sound), it starts from scratch and increases the amplitude of the signal, once again the naked eye I can see more or less where the period is, marked the x=48, this file is sampled in 44100hz bearing 44100/48=918.7500hz, I walked the first 1024 (position of vector 1 until 1024) samples from the archive, let’s walk to the next 1024 (position of vector 1024 until 2048):

inserir a descrição da imagem aqui

This time the naked eye I marked the x=49, bearing 44100/49=900hz, That’s where I wanted to get it looks like the difference from a whole 48 to 49 18hz by so much a time domain algorithm has to be smart enough to find the period in fractional mode, but come on this is an example to eye for you to understand how to find frequencies, let’s skip to the next block and try to find something close to the next expected frequency (1428.5hz):

inserir a descrição da imagem aqui

This time I marked the x=32 = 1378hz.

I go to 1024 samples at some point on the last piece, this time we will expect a frequency close to 1776.7 Hz:

inserir a descrição da imagem aqui

Ready x=25 = 1764hz

Just talk about how to get frequencies in the time domain you probably won’t want to go this way, for what you need goertzel remains the best choice.

So you ask me how I can find the frequencies in the frequency domain, because well, again without going into mathematical models we can simplify and say that Fourier was able to prove that every form of periodic wave can be decomposed into sine waves, each spectral component represents a senoid, the more precise the senoid waveform decomposition will be (order of resolution), but this will cost you much more mathematical processing.

Having this in hand you will know on which spectral component to expect your frequencies, let’s imagine that you generate 4096 spectral components, this would give you a resolution order = 44100/ 4096 = 10,7666015625hz, that is to say that the accuracy of each component is close to 11hz difference, roughly you may be missing a frequency in the accuracy of 11hz in each of the 4096 components, before that you have to know that there is a theorem called Nyquist, it defines that to reconstruct a signal with minimal loss of information the sampled frequency must be equal to or greater than twice the highest frequency of the spectrum of that signal, if our sampling frequency is 44100hz then the highest possible frequency within this signal will be 44100/2=22050hz, but then how to know which frequencies I will have in each spectral component of the series Fourier:

  componente 1 -> 10,7666015625000hz

  componente 2 -> 21,5332031250000hz

  componente 3 -> 32,2998046875000hz

  componente 4 -> 43,0664062500000hz

  componente 5 -> 53,8330078125000hz

    …

    …

    …

  componente  2047 -> 22039,2333984375hz

  componente  2048 – > 22050hz

Opa we arrive at the 2048 component which is the maximum frequency allowed by Nyquist’s theorem.

In case of your need the frequency 913.8hz, would be in the component 85, for the 84 it would be below the frequency and the 86 above.

>> 10.7666 * 84

ans =

  904.3944

>> 10.7666 * 85

ans =

  915.1610


>> 10.7666 * 86

ans =

  925.9276

Goertzel would be looking directly at the component you need and not at all the other 2048 components, look at the absurd computational gain he gives you.

Follow a python joke that shows the frequencies of the entire audio file still using the example "RO' SIT", a detail used wave.open is native to python but will return the same scipy data:

import wave
import numpy as np
from matplotlib import pyplot as plt

wf = wave.open('C:\Users\ederwander\Desktop\RO_SIT.wav', 'rb')
sinal = wf.readframes(-1)

Amplitude = np.fromstring(sinal, dtype=np.int16)
AmplitudeJanelada=Amplitude*np.hamming(len(Amplitude)); 
Fourier=abs(np.fft.rfft(AmplitudeJanelada))


NyquistTeorema = (wf.getframerate() / 2)

MinFrequencia=NyquistTeorema / (len(Amplitude) / 2);

Frequencias=np.linspace(MinFrequencia, NyquistTeorema, num=(len(Amplitude) / 2))

plt.figure(1)
plt.title('Fourier')
plt.plot(Frequencias,Fourier[0:len(Frequencias)])


plt.figure(2)
plt.title('Fourier Zoom')
plt.plot(Frequencias[800:2000],Fourier[800:2000])


plt.show()

I used the size of the entire audio file that gives a resolution of 44100/45506 = 0.9691hz, also only used the real part of Fourier.

The first Plot shows all spectral components returned by Fourier:

inserir a descrição da imagem aqui

It became a little difficult to see the frequencies I zoomed in on the second Plot in the code, you will see that it goes from 800hz to 2000hz which is more or less the frequency zone of your interest.

inserir a descrição da imagem aqui

If you use an algorithm that finds only peaks in this Plot you will notice that the first peak happens in the 913.89hz frequency the second at 1428.29hz, and the third at 1776.36hz, it actually matches the Wiki frequencies.

  • fantastic. Thank you very much. With the answer of the other question, it has already given MUCH support to continue. One last question: the frequency duration is not important, so for this case of study, is it not? Since, it exists and the order occurs exactly as in the wiki link, correct? Or is there a need for me to try and figure out the duration of it?

  • 1

    @Diogopaschoal, for your case it is obvious q the duration will be important, the amount of samples (duration) q you will send to Fourier or goertzel along with the response of each frequency will define what will be your SIT, as I told you Goertzel is the wisest way to solve your problem, the analyses here have been to demonstrate some concepts in the time domain and frequency in which you will come across, my answer has gone far beyond the scope of your question, If you have more questions open a new question about Fourier/Goertzel and duration I can try to formulate an answer.

3

The data of an uncompressed sound wave, as represented in a file ". wav", (and available in an array after reading) do not symbolize the frequency at that point, but rather the amplitude - that is, the "position" of the sound wave at each moment of time.

In physical terms, it is this number that can, for example, be used to position a membrane that pushes air (as happens on a speaker) 0 on a . 16-bit wav, each of these data, called a sample (sample) in English, directly represents the position of the speaker membrane (or, "air pressure") at each time of time - not the frequency - this number can range from -2**15 to 2*15-1 (32767) - this displacement -32000 - means in physical terms the maximum voltage in the audio signal at that moment - which implies maximum displacement of the speaker membrane.

. The sound wave is represented numerically according to the number of samples per second. If it is a file of 44100 samples per second (the function scipy.io.wavfile.read returns this number as the first element of the returned tuple) that is to say that 44100 of the numbers composing the array will be used in a second - each of the 44100 numbers indicating a position of the speaker membrane. The number of comings and goings of this loudspeaker membrane per second is that it represents the frequency: that is - every time these numbers go from a lightening series to a descending and then to a lightening back, one counts a "cycle" - if this descent and descent of numbers takes 441 numbers to happen in a sampled wave in 44100 Herts, it means that the frequency at that point is 100 hertz (in 1 second, you will have 100 wave peaks - or cycles).

I took one of the files you indicated, and created the sampling with 44100Hz (apparently, an exaggeration - it seems that the original sampling was only 8000Hz) - but then, the data I have at position 0 until 100 are:

array([-4371, -5314, -6153, -6870, -7452, -7888, -8169, -8289, -8246,
       -8040, -7676, -7161, -6506, -5722, -4825, -3834, -2767, -1646,
        -493,   670,  1820,  2934,  3989,  4967,  5846,  6610,  7244,
        7735,  8074,  8255,  8273,  8128,  7824,  7366,  6763,  6027,
        5173,  4218,  3179,  2079,   937,  -222, -1378, -2506, -3585,
       -4594, -5512, -6323, -7009, -7557, -7957, -8201, -8283, -8203,
       -7961, -7563, -7016, -6331, -5521, -4602, -3593, -2513, -1383,
        -226,   935,  2079,  3181,  4221,  5178,  6033,  6769,  7372,
        7829,  8132,  8276,  8256,  8073,  7732,  7238,  6602,  5837,
        4956,  3978,  2922,  1809,   660,  -501, -1653, -2771, -3835,
       -4823, -5716, -6497, -7150, -7662, -8024, -8228, -8270, -8150, -7869], dtype=int16)

we can see that the numbers meet in this range has approximately 2 peaks near the +8000 - ie approximately 1 cycle every 50 samples, the 44100 samples per second - which is equivalent to a frequency of about 880Hz - this calculation roughly shows the order of magnitude - if we take the exact peaks, the numbers 8273 at position 30, and 8276 at position 76 - 44 samples. 44100 samples/s / 44 samples/cycle and we have 1002 cycles/s - well within your desired range.

I don’t know much about signal processing and frequency localization - certainly the scipy has functions that can do frequency analysis on that data and return a series of frequencies over the sound, as seems to be its goal.

But, if this is more complex than you need, it is possible to make a coarse analysis function, which returns all the intervals between each time the amplitude crosses the 0 position, for example, and from there, give the predominant frequency at various audio points.

A function can count the duration of each wave cycle in the audio, and another notes each position (in seconds) and the frequency where the duration of the cycles changed more or less consistently. It’s very different and raw near the mathematical frequency treatment possible, but maybe it can be refined enough to recognize this type of file you want to treat:

import numpy as np
from __future__ import division


def count_peak_distances(data):
    res =  []
    previous = 0
    count = 0
    for i, sample in enumerate(data):
        if previous > 0 and sample < 0:
            res.append((i, count))
            count = 0
        previous = sample
        count += 1
    return res[1:]


def describe_wave(data, frequency):
    # allow for a frequency distortion of 
    # 0.2%  before counting as a tune-change-point
    delta = 0.0001 * frequency
    res = []
    last_cicle_size = 0
    last_changed_position = -1
    for sample_position, cicle_size in count_peak_distances(data):
        if not (last_cicle_size - delta < cicle_size < last_cicle_size + delta):
            values =  sample_position /  frequency, frequency / cicle_size
            # avoid anotating short frequency peaks at frequency boundaries
            if values[0] - last_changed_position > 0.01:
                res.append(values)
                last_changed_position = values[0]
            else:
                res[-1] = values
            last_cicle_size = cicle_size
    return res

throwing the data from my file. wav, mono, from 44100 hertz get these times in seconds where frequency changes (and the respective frequencies) happen inside the file. Keep in mind that these functions do not deal with the "silence" in the file (holds od file where the amplitude is low) therefore, the silence bands are with spurious frequencies:

In [121]: describe_wave(d1, 44100)
Out[121]: 
[(0.0019501133786848073, 980.0),
 (0.2625170068027211, 1378.125),
 (0.6432426303854876, 1764.0),
 (1.0228798185941044, 2004.5454545454545),
 (3.0029931972789115, 980.0),
 (3.2787755102040816, 1378.125),
 (3.659501133786848, 1764.0),
 (3.7337868480725622, 1764.0),
 (4.03984126984127, 1470.0)]
  • Thanks, @jsbueno. Helped me understand the data returned by the function. A colleague had already told me also to see the distances between the peaks.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.