Data geeks can have a lot of fun browsing through the portals of performing right organizations (PRO’s). There, one can find works that have been registered, lists of unmatched works, lists of missing cue sheets and also files containing unidentified works.
And if nobody identifies and corrects the mistakes, the monies collected for unidentified songs will be held for years until they are liquidated (or added as a lump sum to distributable amounts. Often songs end up on unidentified lists because of simple typos.
Look for instance at the ways the name of the famous film/TV composer Alexandre Desplat is spelled in just one PRO file:
ALEXANDRE DESPLAT J
Some misspellings, like DEXPLAT, look weird in first instance, but are easy to understand if you consider that on most computer keyboards the X and S are situated next to one another. This also applies to DESPLOT, where the O finds itself close to the P and L keys. Less logical is the N in DESPLANT as the N key isn’t close to any other keys used to type DESPLAT. Other mismatches are caused by the use of the composer’s first initial which is then merged with his last name.
In cases like these one would expect the PRO’s to use string metrics in their matching processes, preferably at the point where the data enters the database. A basic script based on Jaro-Winkler’s metric would already deal with the most obvious mismatches and if machine learning technologies were added even bigger steps might be taken.
Quite a nice business to be in for companies like Noctil, Winpure and others.