Garbage in, garbage out (aka GIGO)
Just a way of saying that incorrect or low quality (data) input will always produce poor output. Well, data geeks can discover quite some garbage browsing through the portals of performing right organizations (PROs). Sinde the portals one can find works that have been registered, lists of unmatched works, lists of missing cue sheets and also files containing unidentified works. If nobody identifies and corrects the mistakes, the monies collected for unidentified songs will be held for years until they are liquidated (or added as a lumpsum to distributable amounts). Often songs end up on unidentieds lists because of simple typos. Look for instance at the ways the name of the famous film/TV composer Alexandre Desplat is spelled in just one PRO file:
ADESPLAT
ADESPLAT
ADISPLAT
ALEXANDER DEPLAT
ALEXANDER DESPLAT
ALEXANDRE DESPALT
ALEXANDRE DESPIAT
ALEXANDRE DESPLANT
ALEXANDRE DESPLAT J
ALEXANDRE DESPLOT
ALEXANDRE DEXPLAT
ALEXANDRE PLAST
ALEXANDRRE DESPLAT
Some misspellings, like DEXPLAT, look weird in first instance, but are easy to understand if you consider that on most computer keyboards the X and S are situated next to one another. This also applies to DESPLOT, where the O finds itself close to the P and L keys. Less logical is the N in DESPLANT as the N key isn’t close to any other keys used to type DESPLAT. Other mismatches are caused by the use of the composer’s first initial which is then merged with his last name.
In cases like these one would expect the PROs to use string metrics in their matching processes, preferably at the point where the data enters the database. A basic script based on Jaro-Winkler’s metric would already deal with the most obvious mismatches and if machine learning technologies were added even bigger steps might be taken.
Quite a nice business to be in for companies like Noctil Verifi Media WinPure™ and others.
#musicindustry #metadata #masterdatamanagement #copyright #neighboringrights #neighbouringrights #zanoise