Search

Feedback

musicbrainz integration

shs

Retired Editor
Posts: 79

shs @ 2003-07-30 15:55:08 UTC

Have you guys considered building on the work done by MusicBrainz, the free database of music metadata?


The already have a massive collection of artists, albums and songs available at musicbrainz.org and if you used it then you guys could concentrate on recording who covered/sampled who rather than reinventing wheels.


It would also mean that people could find their info on musicbrainz and so be able to submit more relevant info to you.


For example:


http://www.musicbrainz.org/showtrack.html?trackid=279884


is a cover of


http://www.musicbrainz.org/showtrack.html?trackid=154923


You should check it out.

shs

Retired Editor
Posts: 79

shs @ 2003-07-30 16:43:28 UTC

Also, have you seen these guys (they have a focus on vinyl and older samples):


http://the-breaks.com/

Mathieu

Manager
Posts: 7325

Mathieu @ 2003-07-30 18:06:49 UTC

First of all, I think that trying to integrate two websites with different people and different opinions a bad idea. If you suggest that we would just use their data, we'd have to ask permission, and we'd be fully dependent on them, which is not desired.


We are aware that there are websites around with huge amounts of data on songs (www.allmusic.com is another example), but we also know that those databases can contain alot of errors, so we rather rely on ourselves and people we trust.


It's true one should avoid reinventing the wheel, but I also think that the net is large enough to have different databases with this kind of contents. It gives people the choice what to use, and can compare sites to check whether the information is correct (unless one database shamelessly copies the other).


I've noticed one big difference between musicbrainz and cover songs database. MB keeps track of 'physical' tracks, ie if the same song is released on two albums, it will exist two times. In CSD, a 'record' is more like an intellectual entity. It will be unique.

In that aspect, i can't see how the two projects could merge.

Bastien

Manager
Posts: 35905

Bastien @ 2003-07-30 18:33:01 UTC

Also, have you seen these guys (they have a focus on vinyl and older samples):


http://the-breaks.com/


Just had a quick look, seems very helpful.  Our sample-man Denis ( Wink) will be able to use this for sure.  Thanks a lot!

shs

Retired Editor
Posts: 79

shs @ 2003-07-30 20:21:20 UTC

First of all, I think that trying to integrate two websites with different people and different opinions a bad idea. If you suggest that we would just use their data, we'd have to ask permission, and we'd be fully dependent on them, which is not desired.


I was thinking more about just using their data rather than a merger.


If you look at http://www.musicbrainz.org/products/server/download.html you will see that their entire database (and server software) is downloadable and the data either placed into the public domain or licenced via the creative commons so that you can pretty much do what you like with the data (though read the cc licence for full details).


I was thinking that you could use MusicBrainz as a data-source in the same way that they use freedb.org. They allow you to import data from freedb before cleaning it up to their own standards. This saves typing and duplicating work that others have done while at the same time allowing them to set a higher standard than freedb.org.


It appears that you are channeling additons through a trusted team (which has benefits and drawbacks) but even for small numbers of people, a bit of perl written to suck data out of their database and into yours would likely increase your efficiency for minimal effort.


You note one of the side effects of Musicbrainz being built from the CD up, in that different editions of CDs appear as seperate entries. This shouldn't bother your, as you currently can't use that data in your schema and so you can just ignore/drop duplicates.

Mathieu

Manager
Posts: 7325

Mathieu @ 2003-07-30 22:06:08 UTC

It appears that you are channeling additons through a trusted team (which has benefits and drawbacks) but even for small numbers of people, a bit of perl written to suck data out of their database and into yours would likely increase your efficiency for minimal effort.


That's sounds great, but a quick look at their database shows no year, no composers, which is the most important for our database. So we would still need to look up the records.


In your first post you suggested that users could submit a cover relation by giving the two ids from musicbrainz. The problem is that a record (as we conceive it) can have different ids in the musicbrainz database. This implies that i'd have to implement duplicate checking which isn't trivial for music, as the music industry doesn't tend to be consequential.


If I'd suck data from musicbrainz after all, I'd just have the titles, performers and albums without furter information. Biggest problem is that the data won't have been checked. After a quick browse I find "Queen Latifah" and "Queen Latifa". I think that those errors are more frequent in a system where everyone can add, than in a system where you have to be trusted.

Denis

Retired Editor
Posts: 9966

Denis @ 2003-07-31 14:21:58 UTC

the-breaks.com (used to be the Sample FAQ) has been one of my primary sources for sampling for many years now. I found it very complete and accurate. However I've been "transporting" some data from the-breaks to our database lately, of course with double- and triple-checking, and found that some of the information on the-break is missing or incorrect, probably due to the quantity of their database.

Another sad fact is that they only cover samples used by hiphop artists while our aim is to include everyone who sampled/covered something.

But as reference for samples, it certainly beats every other site i've seen!

dsvensson

Member
Posts: 3

dsvensson @ 2012-10-18 13:11:08 UTC

That's sounds great, but a quick look at their database shows no year, no composers, which is the most important for our database. So we would still need to look up the records.


Not sure where you're looking, here's an example:

http://musicbrainz.org/release/6473ff12-496f-3fb8-a94a-65d565487885


The amount of attributes for each album/track etc differs from release to release as it's all crowd sourced with different ambition levels. But if you guys enter that data yourself, it would make sense to add it to musicbrainz instead. Also, if you should find incorrect metadata, it's just a matter of changing it, and hope that your change doesn't get downvoted (low risk since they are true music nerds as well, and are suckers for correctness).


Here's btw examples of cover relations:

http://musicbrainz.org/recording/5bf51cfc-55bc-4dc8-bdd1-fd173ae414ae


...which actually links to SHS Smile


In your first post you suggested that users could submit a cover relation by giving the two ids from musicbrainz. The problem is that a record (as we conceive it) can have different ids in the musicbrainz database. This implies that i'd have to implement duplicate checking which isn't trivial for music, as the music industry doesn't tend to be consequential.


musicbrainz have release groups with their unique id for albums, and each album in a release group have its own id. For example some album released in Japan might differ from the same album being released in the US.


If I'd suck data from musicbrainz after all, I'd just have the titles, performers and albums without furter information. Biggest problem is that the data won't have been checked. After a quick browse I find "Queen Latifah" and "Queen Latifa". I think that those errors are more frequent in a system where everyone can add, than in a system where you have to be trusted.


As you mentioned above, you check your metadata manually, so the work is already done once, and doing it for musicbrainz instead would at worst lead to the same amount of work, but hopefully less.


And finally to the main point I'm hoping to make here. You have a nice API going, but it's kind of broken since it only exposes artist names as the artist of some cover, and only artist name for lookups. This makes it pretty fragile when it comes to multiple artists with the same name. Since many people have musicbrainz tags in their mp3s/flacs/etc, it's trivial as a music player developer to find exact matches of additional content online using these tags without the risk of getting the wrong artist... so that would for me be the largest win with integrating with musicbrainz. You can continue to enter your own metadata, just getting the artistmusicbrainz artist_id connection going would be a huge win. For example, Freebase, LastFM, and YouTube are today consumers of the MusicBrainz data, perhaps worth something.

Last edit: 2012-10-18 13:51:40 UTC by dsvensson

Denis

Retired Editor
Posts: 9966

Denis @ 2012-11-14 23:42:49 UTC

Quite an old post you're reviving there Smile


Linking to Musicbrainz is obvious, we've been discussing it for quite a while now. It's many times more interesting than it was back in 2003. Now that our API is in place, we're working on further integration with other services, including Musicbrainz.

Meanwhile you can already take advantage of Musicbrainz and SHS through our Million Song Dataset at http://labrosa.ee.columbia.edu/millionsong/secondhand


BTW, do you have specific plans with our API? We'd love to hear them.