Copyright © 2005-2023 MultiMedia Soft

How to recognize music and obtain related information

Previous pageReturn to chapter overviewNext page

Music information retrieval (MIR) is the interdisciplinary science of retrieving information from an audio stream: by leveraging services made available by the ACRCloud platform, Audio DJ Studio API for .NET allows managing recognition of audio streams in different ways:


By analyzing sound or video loaded inside a given player
By analyzing sound or video files stored inside the local PC
By analyzing sound files stored inside a memory buffer (this feature is not supported for video files)
If you are running Windows Vista or higher versions, by recognizing audio streams flowing through a WASAPI capture device, meaning that for example you may capture any audio data from any source near the microphone of your PC or incoming from the Line-In connector
If you are running Windows Vista or higher versions, by recognizing audio streams flowing through a WASAPI loopback device, meaning that you may capture any audio data being played through the speakers of your PC.


As you may understand, due to the fact that the component needs to connect to an external service, this feature requires the availability of an Internet access so it could not be used while offline.


Music recognition can be managed through the SoundRecognizerMan class accessible through the SoundRecognizer property;


In order to connect to the ACRCloud servers and perform the sound recognition, the component needs to know the access key and the secret key issued by ACRCloud so, in order to enable the MIR feature, the final user will have to register for an account with ACRCloud. Once the registration is completed and access and secret keys are issued, as a developer you should give the final user the possibility to enter them inside a form of your application and finally send them to the component through the SoundRecognizer.InitInfo method.


The sound recognition phase starts by extracting a small portion of sound in order to calculate its fingerprint: the smaller the extracted portion, the faster will be the recognition: usually a sound snippet should be from 3 to 5 seconds long, enough to identify most of the commercial sounds available around: you can set initial and final positions where the sound snippet extraction occurs through the nSnippetStartSec and nSnippetStopSec parameters of the SoundRecognizer.InitInfo method: for snippet obtained from local or memory files it's suggested to avoid extracting the very beginning of the sound because it may contain stuffs like silent portions or intro useless for calculating the fingerprint.


The current implementation of the sound recognizer allows to extract the sound snippet in 3 different ways:

from the sound loaded into a player through the SoundRecognizer.StartOnPlayer method
from a sound file stored on a local drive through the SoundRecognizer.StartOnFile method
from a sound file stored in memory through the SoundRecognizer.StartOnMemory method
from a WASAPI capture or loopback device through the SoundRecognizer.StartOnWasapiDevice method


As soon the sound snippet is extracted and sent to the ACRCloud servers, the container application is informed through the CallbackForSoundRecognizerEvents delegate invoked with the nEvent parameter set to EV_SOUND_RECOGNITION_STARTED.


Once recognition is completed, mentioned "Start" methods, with the only exception of the SoundRecognizer.StartOnWasapiDevice method, invoked with the bWaitCompletion parameter set to BOOL_TRUE will return a unique identifier into the nUniqueId parameter: with this identifier you can start enumerating the total number of music items, that have been found matching the analyzed sound snippet, through the SoundRecognizer.ResultsCountGet method.


In case the SoundRecognizer.StartOnPlayer, SoundRecognizer.StartOnFile and SoundRecognizer.StartOnMemory methods should have been invoked with the bWaitCompletion parameter set to BOOL_FALSE, the returned nUniqueId parameter would still not be ready for usage and you should wait for the CallbackForSoundRecognizerEvents delegate invoked with the nEvent parameter set to EV_SOUND_RECOGNITION_DONE, the default condition for the SoundRecognizer.StartOnWasapiDevice method.


For each of the music results, you can obtain the following set of information through the SoundRecognizer.ResultInfoGet method:

Date of release
ISRC code
UPC code
Duration in milliseconds
Name of the album containing the music
If the music is stored on Youtube, the identifier of its Youtube page


A music could be performed by more than one artist and the way to enumerate them is through the SoundRecognizer.ResultArtistsCountGet method. You can then obtain information for each artist through the SoundRecognizer.ResultArtistInfoGet method and, for each artist again, to enumerate his/her role though the combination of the SoundRecognizer.ResultArtistRolesCountGet and SoundRecognizer.ResultArtistRoleInfoGet methods.


Finally, the same music result could be related to more than one musical genre: you can enumerate genres associated with the music result through the combination of the SoundRecognizer.ResultGenresCountGet and SoundRecognizer.ResultGenreGet methods.


Once obtained music results are no more needed, you can free some memory through the SoundRecognizer.ResultsDelete method.


A sample of use of the SoundRecognizer object in Visual C#.NET and Visual Basic.NET can be found inside the following sample installed with the product's setup package:

- MusicRecognizer