How to manipulate audio volume in real time

Question

How to manipulate audio volume in real time

Asked 5 years, 11 months ago

Viewed 241 times

2

How can I manipulate the volume of this audio in real time ?

import time, pydub, numpy as np, sounddevice as sd, random

file_path = '/path/to/file.mp3'

def read(f, normalized=False):
    a = pydub.AudioSegment.from_file(f)

    y = np.array(a.get_array_of_samples())
    if a.channels == 2:
        y = y.reshape((-1, 2))
    if normalized:
        return a.frame_rate, np.float32(y) / 2**15
    else:
        return a.frame_rate, y

data, fs = read(file_path)

sd.play(data, fs)
#sd.play(data * 20, fs) // eu só sei que se eu multiplicar eu consigo aumentar ou diminuir o volume, não consegui chegar em nada além disso

while(True):
    time.sleep(0.1)

and you think of changing the volume as ? using some key to decrease or increase the volume in real time ?

– ederwander

2019/09/08 at 00:37

2 answers

Browser other questions tagged python-3.x áudio dsp

You are not signed in. Login or sign up in order to post.

by Vinicius Bussola • **664** points · Answer 1 · 2019-09-03T13:31:19+00:00

Take a look at this library: https://github.com/jiaaro/pydub

#Make the beginning louder and the end quieter

# boost volume by 6dB
beginning = first_10_seconds + 6

# reduce volume by 3dB
end = last_5_seconds - 3

by ederwander • **6,431** points · Answer 2 · 2020-07-06T16:52:44+00:00

I scheduled to answer this question a long time ago but never had time (hj had time rsrs)...

To work in real time with audios you need to imagine how to change each piece of your file and send to the output sound device...

You will need to work with Audiocallback(a nice explanation in English), or build a loop that can stream pieces of audio, in this loop every iteration you will be able to change the volume of sound via some user interface(keyboard, some button) ...

Lucky for us Python has an amazing module called Pyaudio, so we can decode the audio and stream it to Pyaudio, allowing you to change each piece of audio you want before you dump it into the output buffer!

If you want something with better performance you can work with callback in Pyaudio...

But let’s get to the point, the code, I did here to demonstrate how this works inside a python looping, I’m using its function with normalized=True to keep the file decoded at float point, I am testing on windows floor so for user interaction I am loading a module called msvcrt this module allows me to use the function getch() to capture the keys pressed by the windows user, as I am inside a for with each iteration I can check if any key has been pressed.

At the moment I’m using the arrow up key code = 18656 to increment the volume and down arrow key code = 20704 to decrease the volume...

The For will walk through the whole audio, for each iteration is done checking the keys and is separated a piece of audio with 4096 samples, this will be done until the for go through the entire decoded file!

For each iteration an audio piece is separated from the vector and is applied to volume multiplication, I used a factor of 0.1 for each decrease or increment, right after I am "clip" the audio, removing any amplitude that extrapolates the values of the float point (-1 e 1) and finally the audio is encapsulated and sent to Pyaudio to play the altered piece(stream), as this type of processing is low (just calculate a new volume) there are no gaps between each frame in the loop, and so the magic of changing the volume in real time happens, think about doing more complex things now with audio + python in real time ...

Complete code:

from struct import pack
import pydub, numpy as np
import pyaudio
import msvcrt


file_path = 'Joe_Satriani_-_Starry_Night.mp3'

def read(f, normalized=True):
    a = pydub.AudioSegment.from_file(f)

    y = np.array(a.get_array_of_samples())
    if a.channels == 2:
        y = y.reshape((-1, 2))
    if normalized:
        return a.frame_rate, np.float32(y) / 2**15
    else:
        return a.frame_rate, y

fs, data = read(file_path)

print("Tocando ...")


# Inicializar PyAudio
pyaud = pyaudio.PyAudio()

# Abir stream
stream = pyaud.open(format =  pyaudio.paFloat32,
                channels = 1,
                rate = fs,
                output = True)



x=0;
volume=1

#for até o final do áudio
for i in range(0, len(data), 4096):

    #capturando teclas no windows com python
    if msvcrt.kbhit():
        aa = ord(msvcrt.getch())
        if aa == 0 or aa == 224:
            b = ord(msvcrt.getch())
            x = aa + (b*256)
    
    #seta para cima == 18656 incrementa volume
    if x == 18656:
        volume=volume+0.1;
        x=0;
    
    #seta para baixo == 20704 decrementa volume
    if x == 20704:
        if volume>0.1:
            volume=volume-0.1;
        x=0;

    #Cortando o áudio em um tamanho de 4096 samples e aplicando volume
    chunk = (data[i:i+4096]) * volume
    #garantindo que nenhum valor extrapole os limites do float point
    chunk=np.clip(chunk, -1, 1)
    #Codificando o áudio e enviando para o stream do pyaudio
    out = pack("%df"%len(chunk), *(chunk))
    stream.write(out)


#parando tudo
stream.stop_stream()
stream.close()
pyaud.terminate()

A T-DO for you:

Instead of decoding all the audio and storing it whole in the var data try to decode inside the for piece by piece, this will give you a brutal gain in memory usage and processing...