How does Stdaudio work in java?

Asked

Viewed 151 times

1

I’m trying to read the frequencies of an audio, I spent a whole day tending to understand FFT and ended up knowing less than when I started. But in the end I found Stdaudio in java. this Dae: https://introcs.cs.princeton.edu/java/stdlib/StdAudio.java.html.

Just that I still don’t understand how this read of it works, can anyone tell me what exactly the read function is returning? Because I need the frequency but the function returns a double array with values between -1 and 1 and I still don’t understand what that means. If it’s frequency someone can tell me how to turn is a value in Hz?

2 answers

3

@Rafaelguasselli capturing an audio frequency is not something trivial, is trying to do what exactly, guitar tuner or something like that ?

The function read of the code only returns the audio samples in float point, if plot the results will be able to see the waveform of the audio read, there is nothing in the code that you showed trying to get close to capture the frequency.

Fast Fourier Transform (FFT) - I can summarize in the simplest possible way that we use this transform to decompose a signal in the time domain into the frequency domain, this means that we can open a spectrum of all existing frequencies, this is not just audio, can be used for any data you have in the time domain, a spreadsheet with sales data for example, you could try to find patterns from periods that had more sales...

But going back to talking about audio, if you’re trying to make some tuner or capture voice frequencies, FFT is not very recommended, if you’re interested in the physical frequency itself, FFT might be valid, if you are interested in for example knowing which is the tuning scale of a voice or instrument is hardly recommended the pure use of FFT for that purpose, has a crazy mt area called psicoacústica, this defines how our brain interprets a sound, often a sound with a physically well-defined frequency may look different to our ears, and that’s where FFT fails miserably, not in all cases but in some cases, the junction of the fundamental frequency with the harmonics define its real tone for our ears, when using the FFT the "people" ignore the harmonics and only look at what was the highest peak returned by the FFT and only define it as the fundamental frequency of a sound ...

Of course, the size of the FFT is fully related to the order of resolution of the returned components, the smaller the size of the FFT the larger the order of resolution and the less precise the algorithm ...

It’s hard to write without knowing what you’re trying to do and the reasons why you want to capture frequencies...

There are also algorithms for capturing frequencies in the time domain (autocorrelation-based algorithms)

Take the time to read these answers:

  1. From the waveform how to know the frequency, just looking and finding the period, names algorithms to find frequencies free (here)
  2. Python examples of how to use autocorrelation in the time and frequency domain to achieve frequencies (here)
  3. All you need to know about how FFT maps the spectral components based on the size of the sent FFT + a lot of names of known techniques to capture frequencies + my code in open source java in tarsos (here)

Remembering that periodicity/frequency are synonymous, if you know in which period something repeats you will know the frequency...

The code posted by @Scarabelo will only return the audio spectrum, to get what is the dominant frequency you have to find what is the position of the highest peak of the component returned (will learn how to do this in the link 3 above, python code on link 2 in the last algorithm)

All this is only valid for monophonic sounds if you’re talking about frequency capture for polyphonic sounds the thing goes to a completely different level and I don’t even know where to start rsrsrs

  • +1 by the psychoacoustic, I learned more this!

  • Summarizing everything My goal was to find out which notes were played in a song. But kind of the problem of my project has already been circumvented, I’ll just continue because now Dae is my new challenge. //It’s going to be a bit longer I’ll explain everything below This is for a physics project that I and some colleagues decided to try to find a relationship between physical attributes of the music and emotions, Dae the idea "simpler"(not being ksksk at all simple)

  • would be asking about songs that cause some emotions in a form for various people, analyzing the notes of each song, and note data that had a low standard deviation between all songs of the same emotion in a certain way would be more likely to have relationship. At first I had done a program that read midi. Only that the songs that people sent had almost none in midi and for what I saw it would be very difficult to convert.

  • Currently the problem of the project is solved because we reverse the research. Now we gave some specific songs that we already had midi and asked the emotion But anyway I will continue until I can make this audio program in other formats besides midi. The real problem is that the sites that talk about it are very separate, are many things from various areas that I have to understand to then put everything together. I need to now sort of build a Skill tree so I can find myself in what I have to study to be able to do this program from scratch.

0

Fussando a little in the Internets (I found interesting your question related to FFT) I found an algorithm that runs FFT and discriminates the content of each variable, maybe help you:

Source of the code: https://github.com/hendriks73/jipes/blob/master/src/main/java/com/tagtraum/jipes/math/FFTFactory.java

import javax.sound.sampled.*;

public class AudioLED {

    private static final float NORMALIZATION_FACTOR_2_BYTES = Short.MAX_VALUE + 1.0f;

    public static void main(final String[] args) throws Exception {
        // use only 1 channel, to make this easier
        final AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
        final DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
        final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
        targetLine.open();
        targetLine.start();
        final AudioInputStream audioStream = new AudioInputStream(targetLine);

        final byte[] buf = new byte[256]; // <--- increase this for higher frequency resolution
        final int numberOfSamples = buf.length / format.getFrameSize();
        final JavaFFT fft = new JavaFFT(numberOfSamples);
        while (true) {
            // in real impl, don't just ignore how many bytes you read
            audioStream.read(buf);
            // the stream represents each sample as two bytes -> decode
            final float[] samples = decode(buf, format);
            final float[][] transformed = fft.transform(samples);
            final float[] realPart = transformed[0];
            final float[] imaginaryPart = transformed[1];
            final double[] magnitudes = toMagnitudes(realPart, imaginaryPart);

            // do something with magnitudes...
        }
    }

    private static float[] decode(final byte[] buf, final AudioFormat format) {
        final float[] fbuf = new float[buf.length / format.getFrameSize()];
        for (int pos = 0; pos < buf.length; pos += format.getFrameSize()) {
            final int sample = format.isBigEndian()
                    ? byteToIntBigEndian(buf, pos, format.getFrameSize())
                    : byteToIntLittleEndian(buf, pos, format.getFrameSize());
            // normalize to [0,1] (not strictly necessary, but makes things easier)
            fbuf[pos / format.getFrameSize()] = sample / NORMALIZATION_FACTOR_2_BYTES;
        }
        return fbuf;
    }

    private static double[] toMagnitudes(final float[] realPart, final float[] imaginaryPart) {
        final double[] powers = new double[realPart.length / 2];
        for (int i = 0; i < powers.length; i++) {
            powers[i] = Math.sqrt(realPart[i] * realPart[i] + imaginaryPart[i] * imaginaryPart[i]);
        }
        return powers;
    }

    private static int byteToIntLittleEndian(final byte[] buf, final int offset, final int bytesPerSample) {
        int sample = 0;
        for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
            final int aByte = buf[offset + byteIndex] & 0xff;
            sample += aByte << 8 * (byteIndex);
        }
        return sample;
    }

    private static int byteToIntBigEndian(final byte[] buf, final int offset, final int bytesPerSample) {
        int sample = 0;
        for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
            final int aByte = buf[offset + byteIndex] & 0xff;
            sample += aByte << (8 * (bytesPerSample - byteIndex - 1));
        }
        return sample;
    }

}
  • OK, I’m trying to use this code but I’m still very lost.

  • 1-What is the interference of buf size? When I change it values simply change and I still don’t understand why. 2-What exactly should the magnitudes be? 3-When exactly do I stop this while(true)?

  • 4-I need to use (line:1,2,3,4,5[from the first comment of main]) if I read the audio directly on the stream?

Browser other questions tagged

You are not signed in. Login or sign up in order to post.