The popular digital radio platform iHeartRadio is using deep learning to help uncover what listeners find important when enjoying music.
“Humans perceive music through a wide variety of factors, including rhythm, instrumentation, tempo, and vocals,” wrote Tim Schmeier, a data scientist at iHeartRadio in a recent blog. “Some artists have a distinct “sound” while others are more varied. How do we know that X sounds more similar to Y than Z?”
Using a NVIDIA GeForce GTX 980 Ti GPU and cuDNN-versions of Chainer and Theano deep learning frameworks, the team developed an acoustic vector model trained on spectrograms, where songs that have similar features (sound alike) will be grouped close to each other – unlike their matric factorization approach which suggests tracks popular in the same geographic area and time period as the query track.
The audio-based model was tested on a group of listeners and the “acoustic similarity” between the human raters and the vector space model was 92% accurate.
Schmeier mentions how matrix factorization constrains the listener by suggesting predictable or obvious music and can miss the genuine connections between music, like moods or even genre itself – such as a hard rock song from 1980 vs one from 2010.