How does Stdaudio work in java?

Question

How does Stdaudio work in java?

Asked 5 years, 10 months ago

Viewed 151 times

1

I’m trying to read the frequencies of an audio, I spent a whole day tending to understand FFT and ended up knowing less than when I started. But in the end I found Stdaudio in java. this Dae: https://introcs.cs.princeton.edu/java/stdlib/StdAudio.java.html.

Just that I still don’t understand how this read of it works, can anyone tell me what exactly the read function is returning? Because I need the frequency but the function returns a double array with values between -1 and 1 and I still don’t understand what that means. If it’s frequency someone can tell me how to turn is a value in Hz?

2 answers

Browser other questions tagged java áudio

You are not signed in. Login or sign up in order to post.

by ederwander • **6,431** points · Answer 1 · 2019-10-21T16:01:33+00:00

@Rafaelguasselli capturing an audio frequency is not something trivial, is trying to do what exactly, guitar tuner or something like that ?

The function read of the code only returns the audio samples in float point, if plot the results will be able to see the waveform of the audio read, there is nothing in the code that you showed trying to get close to capture the frequency.

Fast Fourier Transform (FFT) - I can summarize in the simplest possible way that we use this transform to decompose a signal in the time domain into the frequency domain, this means that we can open a spectrum of all existing frequencies, this is not just audio, can be used for any data you have in the time domain, a spreadsheet with sales data for example, you could try to find patterns from periods that had more sales...

But going back to talking about audio, if you’re trying to make some tuner or capture voice frequencies, FFT is not very recommended, if you’re interested in the physical frequency itself, FFT might be valid, if you are interested in for example knowing which is the tuning scale of a voice or instrument is hardly recommended the pure use of FFT for that purpose, has a crazy mt area called psicoacústica, this defines how our brain interprets a sound, often a sound with a physically well-defined frequency may look different to our ears, and that’s where FFT fails miserably, not in all cases but in some cases, the junction of the fundamental frequency with the harmonics define its real tone for our ears, when using the FFT the "people" ignore the harmonics and only look at what was the highest peak returned by the FFT and only define it as the fundamental frequency of a sound ...

Of course, the size of the FFT is fully related to the order of resolution of the returned components, the smaller the size of the FFT the larger the order of resolution and the less precise the algorithm ...

It’s hard to write without knowing what you’re trying to do and the reasons why you want to capture frequencies...

There are also algorithms for capturing frequencies in the time domain (autocorrelation-based algorithms)

Take the time to read these answers:

From the waveform how to know the frequency, just looking and finding the period, names algorithms to find frequencies free (here)
Python examples of how to use autocorrelation in the time and frequency domain to achieve frequencies (here)
All you need to know about how FFT maps the spectral components based on the size of the sent FFT + a lot of names of known techniques to capture frequencies + my code in open source java in tarsos (here)

Remembering that periodicity/frequency are synonymous, if you know in which period something repeats you will know the frequency...

The code posted by @Scarabelo will only return the audio spectrum, to get what is the dominant frequency you have to find what is the position of the highest peak of the component returned (will learn how to do this in the link 3 above, python code on link 2 in the last algorithm)

All this is only valid for monophonic sounds if you’re talking about frequency capture for polyphonic sounds the thing goes to a completely different level and I don’t even know where to start rsrsrs

by Scarabelo • **331** points · Answer 2 · 2019-10-01T13:45:00+00:00

Fussando a little in the Internets (I found interesting your question related to FFT) I found an algorithm that runs FFT and discriminates the content of each variable, maybe help you:

Source of the code: https://github.com/hendriks73/jipes/blob/master/src/main/java/com/tagtraum/jipes/math/FFTFactory.java

import javax.sound.sampled.*;

public class AudioLED {

    private static final float NORMALIZATION_FACTOR_2_BYTES = Short.MAX_VALUE + 1.0f;

    public static void main(final String[] args) throws Exception {
        // use only 1 channel, to make this easier
        final AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
        final DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
        final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
        targetLine.open();
        targetLine.start();
        final AudioInputStream audioStream = new AudioInputStream(targetLine);

        final byte[] buf = new byte[256]; // <--- increase this for higher frequency resolution
        final int numberOfSamples = buf.length / format.getFrameSize();
        final JavaFFT fft = new JavaFFT(numberOfSamples);
        while (true) {
            // in real impl, don't just ignore how many bytes you read
            audioStream.read(buf);
            // the stream represents each sample as two bytes -> decode
            final float[] samples = decode(buf, format);
            final float[][] transformed = fft.transform(samples);
            final float[] realPart = transformed[0];
            final float[] imaginaryPart = transformed[1];
            final double[] magnitudes = toMagnitudes(realPart, imaginaryPart);

            // do something with magnitudes...
        }
    }

    private static float[] decode(final byte[] buf, final AudioFormat format) {
        final float[] fbuf = new float[buf.length / format.getFrameSize()];
        for (int pos = 0; pos < buf.length; pos += format.getFrameSize()) {
            final int sample = format.isBigEndian()
                    ? byteToIntBigEndian(buf, pos, format.getFrameSize())
                    : byteToIntLittleEndian(buf, pos, format.getFrameSize());
            // normalize to [0,1] (not strictly necessary, but makes things easier)
            fbuf[pos / format.getFrameSize()] = sample / NORMALIZATION_FACTOR_2_BYTES;
        }
        return fbuf;
    }

    private static double[] toMagnitudes(final float[] realPart, final float[] imaginaryPart) {
        final double[] powers = new double[realPart.length / 2];
        for (int i = 0; i < powers.length; i++) {
            powers[i] = Math.sqrt(realPart[i] * realPart[i] + imaginaryPart[i] * imaginaryPart[i]);
        }
        return powers;
    }

    private static int byteToIntLittleEndian(final byte[] buf, final int offset, final int bytesPerSample) {
        int sample = 0;
        for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
            final int aByte = buf[offset + byteIndex] & 0xff;
            sample += aByte << 8 * (byteIndex);
        }
        return sample;
    }

    private static int byteToIntBigEndian(final byte[] buf, final int offset, final int bytesPerSample) {
        int sample = 0;
        for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
            final int aByte = buf[offset + byteIndex] & 0xff;
            sample += aByte << (8 * (bytesPerSample - byteIndex - 1));
        }
        return sample;
    }

}