Doubt about the functioning of the Audiorecord class

Asked

Viewed 224 times

2

I’m a beginner and I was trying to understand how to record an audio using android and how it processes this audio. And during the searches I found this site:Audio Record
The example works perfectly. But I didn’t really understand how this class works to capture sound and record that sound. It is known that in digital processing a sampling (number of samples per second) of the voltage values is performed and then these values are converted into binary codes according to the voltage ranges. So the higher the number of bits the more accurate the sound identification will be. In the example below it uses 16 bits. So I’m in doubt about how to relate what it does to digital audio processing in fact.

  1. In the example it declares a variable of name samplerate = 8000. This variable represents the number of samples that the mobile phone microphone will capture per second?
  2. The RECORDER_AUDIO_ENCODING variable is the number of possibilities (2¹6) I can use to relate voltage values in digital form to binary numbers?
  3. At one point he picks up the minimum buffer with the getminBufferSize() method. Why does he have to do this? I can’t use the buffer size I want?

    int bufferSize = Audiorecord.getMinBufferSize(RECORDER_SAMPLERATE,RECORDER_CHANNELS,RECORDER_AUDIO_ENCODING);

In the documentation it still says:

Returns the minimum buffer size required for the Successful Creation of an Audiorecord Object, in byte Units. Note that this size doesn’t Guarantee a Smooth Recording under load, and Higher values should be Chosen According to the expected Frequency at which the Audiorecord instance will be Polled for new data. See Audiorecord(int, int, int, int, int) for more information on Valid Configuration values. I did not understand this question of him having to give me the minimum bugffer. What prevents me from setting the buffer?

  1. When he creates the Recorder object of the Audiorecord class he passes some parameters to the constructor and among these parameters he passes "Bufferelements2rec" and "Bytesperelement". In the documentation it says that this parameter is the "bufferSizeInBytes". And it says the following:

int: the total size (in bytes) of the buffer Where audio data is Written to During the Recording. New audio data can be read from this buffer in smaller Chunks than this size. See getMinBufferSize(int, int, int) to determine the minimum required buffer size for the Successful Creation of an Audiorecord instance. Using values smaller than getMinBufferSize() will result in an initialization Failure. What this parameter represents, being that it first uses the buffersize = getminbuffersize() and then it uses "Bufferelements2rec" and "Bytesperelement"?

In my understanding he would take samples of the sound and record the convex voltage levels in binary number. I didn’t get it right or he does it? If anyone can help you understand that I’d be grateful!!!

            import java.io.FileNotFoundException;
            import java.io.FileOutputStream;
            import java.io.IOException;
            import android.app.Activity;
            import android.media.AudioFormat;
            import android.media.AudioRecord;
            import android.media.MediaRecorder;
            import android.os.Bundle;
            import android.view.KeyEvent;
            import android.view.View;
            import android.widget.Button;

            /**
            *
            * @author RAHUL BARADIA
            *
            *
            */
            public class Audio_Record extends Activity {
            private static final int RECORDER_SAMPLERATE = 8000;

            private static final int RECORDER_CHANNELS = AudioFormat.CHANNEL_IN_MONO;

            private static final int RECORDER_AUDIO_ENCODING = AudioFormat.ENCODING_PCM_16BIT;

            private AudioRecord recorder = null;
            private Thread recordingThread = null;
            private boolean isRecording = false;

            @Override
            public void onCreate(Bundle savedInstanceState) {
            super.onCreate(savedInstanceState);
            setContentView(R.layout.main);

            setButtonHandlers();
            enableButtons(false);

            int bufferSize = AudioRecord.getMinBufferSize(RECORDER_SAMPLERATE,
            RECORDER_CHANNELS, RECORDER_AUDIO_ENCODING);
            }

            private void setButtonHandlers() {
            ((Button) findViewById(R.id.btnStart)).setOnClickListener(btnClick);
            ((Button) findViewById(R.id.btnStop)).setOnClickListener(btnClick);
            }

            private void enableButton(int id, boolean isEnable) {
            ((Button) findViewById(id)).setEnabled(isEnable);
            }

            private void enableButtons(boolean isRecording) {
            enableButton(R.id.btnStart, !isRecording);
            enableButton(R.id.btnStop, isRecording);
            }

            int BufferElements2Rec = 1024; // want to play 2048 (2K) since 2 bytes we use only 1024
            int BytesPerElement = 2; // 2 bytes in 16bit format

            private void startRecording() {

            recorder = new AudioRecord(MediaRecorder.AudioSource.MIC,
            RECORDER_SAMPLERATE, RECORDER_CHANNELS,
            RECORDER_AUDIO_ENCODING, BufferElements2Rec * BytesPerElement);

            recorder.startRecording();

            isRecording = true;

            recordingThread = new Thread(new Runnable() {

            public void run() {

            writeAudioDataToFile();

            }
            }, "AudioRecorder Thread");
            recordingThread.start();
            }

            //Conversion of short to byte
            private byte[] short2byte(short[] sData) {
            int shortArrsize = sData.length;
            byte[] bytes = new byte[shortArrsize * 2];

            for (int i = 0; i < shortArrsize; i++) {
            bytes[i * 2] = (byte) (sData[i] & 0x00FF);
            bytes[(i * 2) + 1] = (byte) (sData[i] >> 8);
            sData[i] = 0;
            }
            return bytes;
            }

            private void writeAudioDataToFile() {
            // Write the output audio in byte
            String filePath = "/sdcard/8k16bitMono.pcm";

                    short sData[] = new short[BufferElements2Rec];

            FileOutputStream os = null;
            try {
            os = new FileOutputStream(filePath);
            } catch (FileNotFoundException e) {
            e.printStackTrace();
            }

            while (isRecording) {
            // gets the voice output from microphone to byte format
            recorder.read(sData, 0, BufferElements2Rec);
            System.out.println("Short wirting to file" + sData.toString());
            try {
            // writes the data to file from buffer stores the voice buffer
            byte bData[] = short2byte(sData);

            os.write(bData, 0, BufferElements2Rec * BytesPerElement);

            } catch (IOException e) {
            e.printStackTrace();
            }
            }

            try {
            os.close();
            } catch (IOException e) {
            e.printStackTrace();
            }
            }

            private void stopRecording() {
            // stops the recording activity
            if (null != recorder) {
            isRecording = false;


            recorder.stop();
            recorder.release();

            recorder = null;
            recordingThread = null;
            }
            }

            private View.OnClickListener btnClick = new View.OnClickListener() {
            public void onClick(View v) {
            switch (v.getId()) {
            case R.id.btnStart: {
            enableButtons(true);
            startRecording();
            break;
            }
            case R.id.btnStop: {
            enableButtons(false);
            stopRecording();
            break;
            }
            }
            }
            };

            // onClick of backbutton finishes the activity.
            @Override
            public boolean onKeyDown(int keyCode, KeyEvent event) {
            if (keyCode == KeyEvent.KEYCODE_BACK) {
            finish();
            }
            return super.onKeyDown(keyCode, event);
            }
            }

1 answer

2


Answers:

In the example it declares a variable of name samplerate = 8000. This variable represents the number of samples that the mobile phone microphone will pick up per second?

A: Samplerate = 8000hz means that every second 8000 microphone samples are captured, if it were 44100hz every second 44100 samples would be captured...

The reverse is true, after you have saved and encoded either your audio(mp3, wav, flac, etc.) to play the audio at the exact time and frequencies you need to dump these samples into the same sampling in which it was recorded/generated, To play an audio sampled at 8000Hz your player has to play 8000 samples per second to your speaker, which happens if you have an audio recorded at 44100hz and play at 8000hz (play 8000 samples per second to the speaker) ?

will happen something called downsample you are picking up fewer samples per second than the audio was generated, the audio will play much slower and looking like it came from hell (bass frequencies).

If there is the downsample process there will also be the upsample process, an audio generated at 8000hz and played at 16000hz for example, will play twice as fast and with the octave frequencies (frequencies twice as high) your audio will look like some squirrels.

The digital process has some characteristics of the analog process, this phenomenon upsample/ downsample also happened in the disks of Vinis, I do not know his age, but in the old days when I went to the house of my bizavó I did not understand what happened, she had a turntable that ran at 78RPM or 78 revolutions per minute (mechanically it had an engine that rotated the axis of the disc at this speed) and I kept playing with the disks with my finger and changing the speed of the rotation of the disc, the sound hour got more severe hour got more acute depending on the speed I rotated the discs, in fact happened the same thing, the discs were recorded to run at 78RPM and if u change the speed of execution will happen downsample/ upsample...

The variable RECORDER_AUDIO_ENCODING is the number of possibilities (2¹6) that I can use to relate voltage values in digital form with binary numbers?

It is really the way you will represent the voltage values, are not in binaries, will be in PCM format a short int or float point, you can have plotted these values, it will be represented in the format you chose, for example if you find an audio in short int (ENCODING_PCM_16BIT) representations of their intentions will vary between whole numbers ranging from -32768 até 32767 for an audio encoded in float point ENCODING_PCM_FLOAT the representations vary from -1 até 1, it is only a way of representation, some prefer short int others float point, some systems only manage to give play in values represented in float point, in android for example there is performance difference, ARM processors low-end have worse performance with float point audio...

At some point it picks up the minimum buffer with the method getminBufferSize(). Why does he have to do this? I can’t use the size of the buffer I want?

Android is a system known for latency problems, OS and Hardware are problematic and work with audio on android need to be magical rsrs, the getminBufferSize() is a parameter generated by google developers that tries to ensure a minimum and acceptable size so that you can record your audio with less latency possible, if you put a smaller buffer size than the one returned by getminBufferSize() will require more processing and for that reason it was locked, and what happens if you want to put a bigger buffer ? will work? Will this depend on your hardware and OS, the device has enough memory to work with the buffer of your choice? are questions I won’t be able to answer...

If you notice in your code the variable bufferSize is not being used anywhere, actually the last parameter(int bufferSizeInBytes) used in your AudioRecord

AudioRecord (int audioSource, 
                int sampleRateInHz, 
                int channelConfig, 
                int audioFormat, 
                int bufferSizeInBytes)

Is it from 2048

int BufferElements2Rec = 1024;only 1024
int BytesPerElement = 2

see:

BufferElements2Rec * BytesPerElement

Just remembering that the getminBufferSize() automatically generates different buffer values depending on the device, in theory you don’t need to keep calculating and worrying about it.

  • About the second question: Is the way it works more or less like this? for voltages between 1.0v and 1.5v will have a 15 value of type short int for voltages between 1.5v and 2.0v will have a 16 value of type short int And so on... More or less that way?

  • On the third question: In fact it does not use the return of getminBufferSize() (bufferSize). So I could, or better, should use getminBufferSize() when creating the Audiorecord class object? As you said, if there is already a function that returns this value to me, why did the person who made this code put 2048?

  • In the code it uses "Recorder.read(sData, 0, Bufferelements2rec);" to save the values read in the sData array. Then it saves this vector in the file. If I want to generate a zoom graph, how do I get the amplitude and time? How does he do this indexing?

  • are so many questions, some escape from your original question, the process of converting tensions is too complex to answer here, there is a process called Analog-to-digital converter (ADC), involves sampling and quantization, the ADC tries to sample the analog waveform as closely as possible, the process that converts this data to short int or float point is the data encoded in PCM, if your app is something simple you can use getminBufferSize but and your app needs to work with a specific buffer you are free to choose

  • vc can use the value to plot your data, there is nothing indexed the size of the vector is the time of your audio and the values contained in it represent the amplitude of your audio, you can use the canvas of java to plot and see the waveform, this is also another question haha, try to segment your problems and ask new questions, so more people can help you, a tip try not to ask 1000 questions in one question, it will be difficult for someone to answer ....

  • Thank you...is that an answer already raises another question...rsrs. Just one last question about your last answer: each vector position represents an amplitude, the vector’s input then is the seconds? Index 2, 2 seconds, Dice 10, 10 seconds and so on? Or does it use in milliseconds, how can I figure that out? Thanks in advance for the help!!!

  • no! the position indicates the sample number, remember I told you about the sample capture process, the example I gave of 44100 samples captured per second remember ? i.e., the vector index number 44100 indicates 1 second of audio, Indice 88200 equals 2 seconds of audio, that is to know how many seconds your audio has based on the vector position is only do samplerate / índice desejada = tempo em segundos, this means that if you do the calculation of the samplerate divided by the total size of your vector you will have as response the total time of your audio in seconds

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.