Music Genre Classification


Music genre classification is an important task for many real world products and services. According to, thousands of songs are uploaded to music streaming platforms every hour. Approximately 24,000 songs are uploaded every day. Calculating further, almost 9 million songs are added to the streamers’ every year. Looking over these numbers needed for database management and search/storage purposes climbs in proportion. To classify the song instantly and also recommend the songs to the user becomes the must functionality for any music streaming or purchasing service.

Need of Music Genre Classification:

Music providers use music classification or music recommendations to their customers. Determining music genres is the first step to achieve that functionality. Machine Learning and Deep Learning techniques have proved to be most effective and efficient in extracting trends and patterns from large data sets.

Important libraries:

  1. Librosa- Librosa is used for music and audio analysis. It provides the necessary functions to create music information retrieval systems. We can extract certain key features from the audio samples such as Tempo, Chroma Energy Normalized, Mel-Frequency Cepstral Coefficients, Spectral Centroid, Spectral Contrast, Spectral Rolloff, and Zero Crossing Rate. This library is used to load the music file.
  2. IPython- IPython.display.Audio lets you play audio directly in an IPython notebook.
  3. Keras- Keras is an open source Python library for developing and evaluating deep learning models. This is used to build the CNN network.
  4. Pandas- Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. This is used to deal with the csv data.
  5. Matplotlib- Matplotlib is a python library for creating static, animated, and interactive visualizations of provided dataset. This is used for creating visual graphs and charts for better understanding.

Data Processing

The goal of effective music visualisation is to achieve a high degree of visual correlation between the spectral parameters of a musical track, such as amplitude and frequency, over time. There are various methods that we can use for visualizing the audio files.

Using Raw wave files It’s possible to quickly scan the audio data and visually compare and contrast which genres are more similar than others. Which give visual representations of sound as time on the x-axis and amplitude on the y-axis.

Raw wave files

Spectrograms to visually display the signal loudness of a signal over time at various frequencies included in a particular waveform. Spectrograms are sometimes referred to as waterfalls when the data is shown in a three-dimensional layout.


Chroma Feature is a useful tool for analysing music features whose pitches can be meaningfully divided often into twelve categories. It’s also a well-known tool for evaluating and processing music data.

Chroma Feature

Zero-crossing rate is the number of times the amplitude of the voice signals passes through a value of zero in a given time interval/frame.

Zero-crossing rate

Feature Extraction:

Before we can train the data, it must first be preprocessed. So, before we can run a model, we must first prepare the data for the model. Data standardisation can be used to reduce all of the features to a single scale without distorting the variances in value ranges. It’s the process of rescaling attributes to have a mean of 0 and a variance of 1. Using fit transform, we can scale the training data and learn the scaling parameters of that data (). Label encoding is required to transform labels into numeric form, which can then be converted into machine-readable form.

Standardization can be performed by formula:

After standardisation, divide the data into training and testing groups. Test the model by making predictions against the test set once it has been processed using the training set.

Building Model:

This is the last step for music genre classification. Features are extracted from raw data now we have to train the model. There are many ways to train the model like:

  1. Multiclass support vector system
  2. K-Nearest Neighbours
  3. K-Mean Clustering
  4. Convolutional Neural Network.

In this blog, we will take CNN to train the model because it gives more accuracy than other modelling techniques.

Performance of various algorithms

Train music dataset in Spectrogram feature with Convolutional Neural Network (CNN). The architecture of CNN can be seen below

Architecture of CNN

CNN contains several layers like input layer, convolution layer and subsampling/pooling layer.

Convolution layer:

Convolutional layers are the layers where filters are applied to the first data, or to other feature maps during a deep CNN. This is often where most of the user-specified parameters are within the network. The foremost important parameters are the quantity of kernels and thus the dimensions of the kernels.

Pooling layer:

Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the quantity of parameters to seek out and thus the quantity of computation performed within the network. The pooling layer summarises the features present during a neighbourhood of the feature map generated by a convolution layer. So, further operations are performed on summarised features instead of precisely positioned features generated by the convolution layer.

For the CNN model, use the Adam optimizer for training the model. The epoch that was chosen for the training model. All of the hidden layers are using the RELU activation function and the output layer uses the SoftMax function. The loss is calculated using the sparse_categorical_crossentropy function. Dropout is used to prevent overfitting.


The main application of music genre classification is to classify music based on the instrumentation, rhythmic structure, and harmonic content of the music and would be a valuable addition to music information retrieval systems. In addition, automatic musical genre classification provides a framework for developing and evaluating features for any type of content-based analysis of musical signals. Spotify is the music streaming platform today. It currently has millions of song databases. Spotify did lots of research to improve the way users find and listen to music. Machine Learning is one of their research projects. To distinguish songs with their types like hip-hop, disco, classical, etc. For that purpose, Spotify uses music genre classification. Siri is used for digital assistance for Apple devices specially the iPhone whereas Alexa is also digital assistance for android application. Alexa’s home assistance is used in smart home devices. By using music genre classification Alexa/Siri can classify music like silent music, hip-hop music.


  2. Music Genre Classification Using CNN | by Arsh Chowdhry | May, 2021 | Clairvoyant Blog (
  3. Musical Genre Classification with Convolutional Neural Networks | by Leland Roberts | Towards Data Science



Computer Engineering student at VIT, Pune

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store