Autocorrelation function in Python

Asked

Viewed 912 times

3

Hello, everybody!

I have a file . txt of data with two columns (in X I have the values in days of observations and in Y the Measured Flow of a sample)

I intend to calculate the Period with which this data repeats and I would like to do this using the ACF (autocorrelation Function) in Python, but so far I cannot.

Here’s what I’ve been doing:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

dado = np.loadtxt(r'.../file.txt') #busco meu arquivo no meu PC

t = dado[:,0] #esta é apenas a coluna que corresponde ao tempo
y = dado[:,1] #esta é apenas a coluna que corresponde ao fluxo medido

# Fazer uma filtragem a partir da Transformada Rápida de Fourier (Fast Fourier Transformed - FFT)

fft_of_signal_with_noise = np.fft.fft(y)
# Normalizar o tempo
N = len(t)
f = np.fft.fftfreq(len(fft_of_signal_with_noise),1600)
frequency_of_signal  = 1.0 


def bandpass_filter(x, freq, frequency_of_signal=frequency_of_signal, band = 0.05):
    if (frequency_of_signal - band) < abs(freq) < (frequency_of_signal + band):
        return x
    else:
        return 0

F_filtered = np.asanyarray([bandpass_filter(x,freq) for x,freq in zip(fft_of_signal_with_noise, f)]);

filtered_signal = np.fft.ifft(F_filtered);

However, I have not yet succeeded in finding the correct Period value for this sample, for example.

NOTE: This sample presents about 1600 days, but the file has in column X about 64,000 (sixty-four MIL) points!

Can someone help me find the value of the Period with which to reperte?

  • But this code of yours doesn’t implement autocorrelation, has it solved your problem? if you haven’t yet I can try to explain how to perform the calculations ...

  • I haven’t been successful yet, but I’d love to see how it works.

  • I hope my answer helps you

1 answer

1

Before I start anything, I need to give you some information...

There are two ways to perform autocorrelation, in your example I could observe that you tried to autocorrelation in the frequency domain, the correct equation to perform Autocorrelation in the frequency domain is:

inserir a descrição da imagem aqui

It is important to know that it is also possible to perform autocorrelation in the time domain:

inserir a descrição da imagem aqui

The latter works sort of that in Brute Force, a pseudo code for autocorrelation of the above equation would be:

AC=zeros( 4096 , 1 );
for k=1:4096/2,
    sum = 0;
    for n=1:4096/2,
        sum = sum + (x(n)*x(n + k));

    end
    AC(k) = AC(k) + sum;
end

The autocorrelation will return the most similar points within the signal, the return of this function (both in the time domain and frequency domain) will demonstrate peaks indicating the positions where the signal has greater similarity, these peaks can be translated as periodicity/frequency, usually I find the difference/subtraction the position of a peak by the other to find the frequency of the analyzed signal.

From the start I can know the sample rate of your data, by the values presented:

this sample presents about 1600 days, however the archive has in the column X about 64,000 (sixty-four thousand) points!

This means that you have a sample rate of 40 samples per day = 64000/1600=40, ie in 1 day is collected 40 samples that make up your data, if you want to know the periodicity 30 days, will need to 30*40=1200 samples, when you use Fourier you need to worry about the resolution order, gives a read on this mine reply.

I can demonstrate how to find the periodicity of a data set using Fourier, as I do not have your data in hand, to demonstrate I will create a senoid frequently in 500Hz and with a sampling rate at 44100hz, this tells me that we are looking for a periodicity of 44100/500=88.2000, save this number is what we are looking for as a result for these my data entries, to demonstrate follow the Plot of a senoid to 500hz with sampling rate at 44100hz(this sampling rate says that every 1 second I am collecting 44100 samples), see Plot:

inserir a descrição da imagem aqui

It is a simple sign/datum, with the naked eye finding repetitions, so I marked the period where the eye seems to occur the first repetition, look there x=89, Remember the value we’re chasing up there ? 88.2000, or naked eye I can define the periodicity of these data, but and how to do this using the first equation ?

Follows my code:

import numpy as np

#inicio gerando o sinal mostrado no plot
Fs = 44100
freq = 500
nsamples = 4096
sinal = np.arange(nsamples)
sinal = np.sin(2 * np.pi * freq * sinal / Fs)
#fim, sinal criado

#aplicando primeira equação
Mag = np.abs(np.fft.rfft(sinal,nsamples*2))**2

AC=np.fft.irfft(Mag[0:nsamples])
#fim da equação

#encontrar onde estão os picos dentro da AutoCorrelação 
peaks = []
k=3

while(k < nsamples - 1):
        y1 = (AC[k - 1])
        y2 = (AC[k])
        y3 = (AC[k + 1])
        if (y2 > y1 and y2 >= y3):
           peaks.append(k)

        k=k+1

periodo = np.mean(np.diff(peaks))

print("Periodo usando a diferenca de picos da Autocorrelacao")

print(periodo)

The result of this algorithm:

C:\Python33>python.exe AC.py
Periodo usando a diferenca de picos da Autocorrelacao
88.1555555556

Bingo 88.1555555556 practically the expected result...

There’s a trick to this part of the code np.fft.rfft(sinal,nsamples*2)), to effect Autocorrelação and not Convolution(are similar, but not equal), you have to compute twice as many samples nsamples*2 that applies zeropad in the second half of the signal, in other words half is your data and the other half is composed of zeros...

There is another way to find the Frequency/Period that is only apply fourier and observe where there are peaks, the Fourier Series exists for this purpose a transformada de Fourier de um sinal periódico gera um espectro discreto no domínio da frequência, only using Fourier:

import numpy as np


Fs = 44100
freq = 500
nsamples = 4096
sinal = np.arange(nsamples)
sinal = np.sin(2 * np.pi * freq * sinal / Fs)

Mag = np.abs(np.fft.rfft(sinal,nsamples*2))**2


index = Mag[0:nsamples/2].argmax()

print("Periodo usando o maior componente espectral de Fourier")

print(1 / (index / nsamples/2))

Upshot:

C:\Python33>python.exe FFT.py
Periodo usando o maior componente espectral de Fourier
88.0860215054

No need for Autocorrelation I came very close, given that the order of resolution for this example is 44100/ 4096 = 10,7666015625hz

Of course besides these two methods you can still use Cepstrum or autocorrelation in the time domain of the second equation...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.