I scheduled to answer this question a long time ago but never had time (hj had time rsrs)...
To work in real time with audios you need to imagine how to change each piece of your file and send to the output sound device...
You will need to work with Audiocallback(a nice explanation in English), or build a loop that can stream pieces of audio, in this loop every iteration you will be able to change the volume of sound via some user interface(keyboard, some button) ...
Lucky for us Python has an amazing module called Pyaudio, so we can decode the audio and stream it to Pyaudio, allowing you to change each piece of audio you want before you dump it into the output buffer!
If you want something with better performance you can work with callback in Pyaudio...
But let’s get to the point, the code, I did here to demonstrate how this works inside a python looping, I’m using its function with normalized=True
to keep the file decoded at float point, I am testing on windows floor so for user interaction I am loading a module called msvcrt this module allows me to use the function getch()
to capture the keys pressed by the windows user, as I am inside a for
with each iteration I can check if any key has been pressed.
At the moment I’m using the arrow up key code = 18656
to increment the volume and down arrow key code = 20704
to decrease the volume...
The For
will walk through the whole audio, for each iteration is done checking the keys and is separated a piece of audio with 4096 samples, this will be done until the for
go through the entire decoded file!
For each iteration an audio piece is separated from the vector and is applied to volume multiplication, I used a factor of 0.1
for each decrease or increment, right after I am "clip" the audio, removing any amplitude that extrapolates the values of the float point (-1 e 1)
and finally the audio is encapsulated and sent to Pyaudio to play the altered piece(stream), as this type of processing is low (just calculate a new volume) there are no gaps between each frame in the loop, and so the magic of changing the volume in real time happens, think about doing more complex things now with audio + python in real time ...
Complete code:
from struct import pack
import pydub, numpy as np
import pyaudio
import msvcrt
file_path = 'Joe_Satriani_-_Starry_Night.mp3'
def read(f, normalized=True):
a = pydub.AudioSegment.from_file(f)
y = np.array(a.get_array_of_samples())
if a.channels == 2:
y = y.reshape((-1, 2))
if normalized:
return a.frame_rate, np.float32(y) / 2**15
else:
return a.frame_rate, y
fs, data = read(file_path)
print("Tocando ...")
# Inicializar PyAudio
pyaud = pyaudio.PyAudio()
# Abir stream
stream = pyaud.open(format = pyaudio.paFloat32,
channels = 1,
rate = fs,
output = True)
x=0;
volume=1
#for até o final do áudio
for i in range(0, len(data), 4096):
#capturando teclas no windows com python
if msvcrt.kbhit():
aa = ord(msvcrt.getch())
if aa == 0 or aa == 224:
b = ord(msvcrt.getch())
x = aa + (b*256)
#seta para cima == 18656 incrementa volume
if x == 18656:
volume=volume+0.1;
x=0;
#seta para baixo == 20704 decrementa volume
if x == 20704:
if volume>0.1:
volume=volume-0.1;
x=0;
#Cortando o áudio em um tamanho de 4096 samples e aplicando volume
chunk = (data[i:i+4096]) * volume
#garantindo que nenhum valor extrapole os limites do float point
chunk=np.clip(chunk, -1, 1)
#Codificando o áudio e enviando para o stream do pyaudio
out = pack("%df"%len(chunk), *(chunk))
stream.write(out)
#parando tudo
stream.stop_stream()
stream.close()
pyaud.terminate()
A T-DO for you:
Instead of decoding all the audio and storing it whole in the var data
try to decode inside the for
piece by piece, this will give you a brutal gain in memory usage and processing...
and you think of changing the volume as ? using some key to decrease or increase the volume in real time ?
– ederwander