Before I start anything, I need to give you some information...
There are two ways to perform autocorrelation, in your example I could observe that you tried to autocorrelation in the frequency domain, the correct equation to perform Autocorrelation in the frequency domain is:
It is important to know that it is also possible to perform autocorrelation in the time domain:
The latter works sort of that in Brute Force, a pseudo code for autocorrelation of the above equation would be:
AC=zeros( 4096 , 1 );
for k=1:4096/2,
sum = 0;
for n=1:4096/2,
sum = sum + (x(n)*x(n + k));
end
AC(k) = AC(k) + sum;
end
The autocorrelation will return the most similar points within the signal, the return of this function (both in the time domain and frequency domain) will demonstrate peaks indicating the positions where the signal has greater similarity, these peaks can be translated as periodicity/frequency, usually I find the difference/subtraction the position of a peak by the other to find the frequency of the analyzed signal.
From the start I can know the sample rate of your data, by the values presented:
this sample presents about 1600 days, however the archive has in the
column X about 64,000 (sixty-four thousand) points!
This means that you have a sample rate of 40
samples per day = 64000/1600=40
, ie in 1 day is collected 40 samples that make up your data, if you want to know the periodicity 30 days, will need to 30*40=1200
samples, when you use Fourier you need to worry about the resolution order, gives a read on this mine reply.
I can demonstrate how to find the periodicity of a data set using Fourier, as I do not have your data in hand, to demonstrate I will create a senoid frequently in 500Hz
and with a sampling rate at 44100hz
, this tells me that we are looking for a periodicity of 44100/500=88.2000
, save this number is what we are looking for as a result for these my data entries, to demonstrate follow the Plot of a senoid to 500hz with sampling rate at 44100hz(this sampling rate says that every 1 second I am collecting 44100 samples), see Plot:
It is a simple sign/datum, with the naked eye finding repetitions, so I marked the period where the eye seems to occur the first repetition, look there x=89
, Remember the value we’re chasing up there ? 88.2000
, or naked eye I can define the periodicity of these data, but and how to do this using the first equation ?
Follows my code:
import numpy as np
#inicio gerando o sinal mostrado no plot
Fs = 44100
freq = 500
nsamples = 4096
sinal = np.arange(nsamples)
sinal = np.sin(2 * np.pi * freq * sinal / Fs)
#fim, sinal criado
#aplicando primeira equação
Mag = np.abs(np.fft.rfft(sinal,nsamples*2))**2
AC=np.fft.irfft(Mag[0:nsamples])
#fim da equação
#encontrar onde estão os picos dentro da AutoCorrelação
peaks = []
k=3
while(k < nsamples - 1):
y1 = (AC[k - 1])
y2 = (AC[k])
y3 = (AC[k + 1])
if (y2 > y1 and y2 >= y3):
peaks.append(k)
k=k+1
periodo = np.mean(np.diff(peaks))
print("Periodo usando a diferenca de picos da Autocorrelacao")
print(periodo)
The result of this algorithm:
C:\Python33>python.exe AC.py
Periodo usando a diferenca de picos da Autocorrelacao
88.1555555556
Bingo 88.1555555556
practically the expected result...
There’s a trick to this part of the code np.fft.rfft(sinal,nsamples*2))
, to effect Autocorrelação
and not Convolution(are similar, but not equal), you have to compute twice as many samples nsamples*2
that applies zeropad
in the second half of the signal, in other words half is your data and the other half is composed of zeros...
There is another way to find the Frequency/Period that is only apply fourier
and observe where there are peaks, the Fourier Series exists for this purpose a transformada de Fourier de um sinal periódico gera um espectro discreto no domínio da frequência
, only using Fourier:
import numpy as np
Fs = 44100
freq = 500
nsamples = 4096
sinal = np.arange(nsamples)
sinal = np.sin(2 * np.pi * freq * sinal / Fs)
Mag = np.abs(np.fft.rfft(sinal,nsamples*2))**2
index = Mag[0:nsamples/2].argmax()
print("Periodo usando o maior componente espectral de Fourier")
print(1 / (index / nsamples/2))
Upshot:
C:\Python33>python.exe FFT.py
Periodo usando o maior componente espectral de Fourier
88.0860215054
No need for Autocorrelation I came very close, given that the order of resolution for this example is 44100/ 4096 = 10,7666015625hz
Of course besides these two methods you can still use Cepstrum
or autocorrelation in the time domain of the second equation...
But this code of yours doesn’t implement autocorrelation, has it solved your problem? if you haven’t yet I can try to explain how to perform the calculations ...
– ederwander
I haven’t been successful yet, but I’d love to see how it works.
– Luciano Alencar
I hope my answer helps you
– ederwander