Audio Processing Libraries in Python

Prabhakar Rangarao
3 min readApr 5, 2021

--

(By Mike Jaso)

Speech technologies have been developed for decades and fundamentally use signal processing for audio processing. As a strategic marketing professional with telecommunications industry background, I look for strategies to infuse machine learning techniques for audio signal processing for analysis and classification to model customer preference and propose recommendation systems.

Our hyper-connected lives have been rewired for the digital-age and signal processing is the science behind driving our digital lives. While much of the writing and literature on deep learning concerns computer vision and natural language processing (NLP), audio analysis — a field that includes automatic speech recognition (ASR), digital signal processing, and music classification, tagging, and generation — is a growing subdomain of deep learning applications. Some of the most popular and widespread machine learning applications are the virtual assistants Alexa, Siri and Google Home are largely products built on models that can extract information from audio signals.

Python has a host of library packages that can perform audio signal processing to accomplish audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking), synthesis and transformation (source separation, audio enhancement, generative models for speech sound, and music synthesis), etc. In particular order, some of the popular audio libraries are listed below:

1. librosa is a python package for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems. For a more advanced introduction which describes the package design principles, please refer to the librosa paper at SciPy 2015.

Reference: https://librosa.org/

It is a Python module to analyze audio signals in general but geared more towards music. It includes the nuts and bolts to build a MIR (Music information retrieval) system. It has been very well documented, along with a lot of examples and tutorials.

Reference: https://github.com/librosa/librosa

2. IPython.display.audio lets you play audio directly in a jupyter notebook.

Reference: https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html

3. TorchAudio: This library is part of the PyTorch project. PyTorch is an open source machine learning framework. torchaudio is primarily a machine learning library and not a general signal processing library.

Reference: https://pytorch.org/audio/stable/index.html

4. With PyAudio, you can easily use Python to play and record audio on a variety of platforms. PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library.

Reference: https://pypi.org/project/PyAudio/

5. SoundFile is an audio library based on libsndfile, CFFI and NumPy. SoundFile can read and write sound files. File reading/writing is supported through libsndfile, which is a free, cross-platform, open-source (LGPL) library for reading and writing many sampled sound file formats that run on many platforms.

Reference: https://pypi.org/project/SoundFile/

6. Essentia is a library providing tools for performing analysis of audio data. Essentia has been developed in the context of research activities in Music Information Retrieval that were held at the Music Technology Group. It caters for the needs of both rapid prototyping and large-scale analysis.

Reference: https://essentia.upf.edu/

7. pyAudioAnalysis is an open Python library that provides a wide range of audio-related functionalities focusing on feature extraction, classification, segmentation, and visualization issues.

Reference: https://pypi.org/project/pyAudioAnalysis/

8. pydub is a Python library to work with only .wav files. By using this library we can play, split, merge, edit our .wav audio files

Reference: https://pypi.org/project/pydub/

9. pyo is a Python module containing classes for a wide variety of audio signal processing types. With pyo, user will be able to include signal processing chains directly in Python scripts or projects, and to manipulate them in real time through the interpreter.

Reference: http://ajaxsoundstudio.com/software/pyo/

In the next blog we will take a few libraries and explore audio processing in machine learning.

Interesting Readings:

  1. Audio in Python
  2. librosa: Audio and Music Signal Analysis in Python

3. pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis

4. An Evaluation of Audio Feature Extraction Toolboxes

About the Blogger:

Prabhakar Rangarao enjoys every day as a new learning experience with Data Science.

--

--