I lost a hard drive a few weeks ago, and with it my up-to-date MP3 collection.
Luckily I had a backup of most of the MP3′s, but I wanted to find out exactly which MP3′s I had recovered and which ones I was missing. I knew the iTunes library is stored in an XML formatted file, but I would need to write something to read it and extract out the filenames of the individual MP3 files and then verify if they existed.
I have been using LINQ quite a bit lately on various projects, but only LINQ to SQL. I knew of LINQ to XML, but now I had a real reason to learn it.
First I looked at the iTunes library format. Here’s a snippet of the beginning of my file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Major Version</key><integer>1</integer>
<key>Minor Version</key><integer>1</integer>
<key>Application Version</key><string>7.6.2</string>
<key>Features</key><integer>5</integer>
<key>Show Content Ratings</key>
<true/>
<key>Music Folder</key><string>file://localhost/C:/Documents%20and%20Settings/chujanen/My%20Documents/My%20Music/iTunes/iTunes%20Music/</string>
<key>Library Persistent ID</key><string>54F22391EB807F23</string>
<key>Tracks</key>
<dict>
<key>1666</key>
<dict>
<key>Track ID</key><integer>1666</integer>
<key>Name</key><string>What's the Matter Here?</string>
<key>Artist</key><string>10,000 Maniacs</string>
<key>Album Artist</key><string>10,000 Maniacs</string>
<key>Composer</key><string>Natalie Merchant/Robert Buck</string>
<key>Album</key><string>In My Tribe</string>
<key>Genre</key><string>Rock</string>
<key>Kind</key><string>MPEG audio file</string>
<key>Size</key><integer>9318485</integer>
<key>Total Time</key><integer>291134</integer>
<key>Track Number</key><integer>1</integer>
<key>Year</key><integer>1987</integer>
<key>Date Modified</key><date>2005-03-09T07:31:09Z</date>
<key>Date Added</key><date>2007-07-20T17:21:36Z</date>
<key>Bit Rate</key><integer>256</integer>
<key>Sample Rate</key><integer>44100</integer>
<key>Comments</key><string></string>
<key>Persistent ID</key><string>54F22391EB807F38</string>
<key>Track Type</key><string>File</string>
key>Location</key><string>file://localhost/S:/mp3/10,000%20Maniacs/In%20My%20Tribe/01%20-%20What's%20the%20Matter%20Here%20.mp3</string>
<key>File Folder Count</key><integer>-1</integer>
<key>Library Folder Count</key><integer>-1</integer>
</dict>
<key>1667</key>
<dict>
<key>Track ID</key><integer>1667</integer>
<key>Name</key><string>Hey Jack Kerouac</string>
<key>Artist</key><string>10,000 Maniacs</string>
Looking at this XML, you can see Apple did not design this very well (IMHO). After writing some code that wasn’t very elegant to access their XML, I decided to google using LINQ to access iTunes and found a great post on the Better Living through Software blog. The author had a neat trick to transform the XML to a much nicer and better formed XML block for each song. Here’s what the first song looks like in the new format:
<song> <TrackID>1666</TrackID> <Name>What's the Matter Here?</Name> <Artist>10,000 Maniacs</Artist> <AlbumArtist>10,000 Maniacs</AlbumArtist> <Composer>Natalie Merchant/Robert Buck</Composer> <Album>In My Tribe</Album> <Genre>Rock</Genre> <Kind>MPEG audio file</Kind> <Size>9318485</Size> <TotalTime>291134</TotalTime> <TrackNumber>1</TrackNumber> <Year>1987</Year> <DateModified>2005-03-09T07:31:09Z</DateModified> <DateAdded>2007-07-20T17:21:36Z</DateAdded> <BitRate>256</BitRate> <SampleRate>44100</SampleRate> <Comments> 0000029F 000002F3 00000F06 00000EE9 00004E4E 0003829E 00005A24 0000547F 00004E4E 00004E4E</Comments> <PersistentID>54F22391EB807F38</PersistentID> <TrackType>File</TrackType> <Location>file://localhost/S:/mp3/10,000%20Maniacs/In%20My%20Tribe/01%20-%20What's%20the%20Matter%20Here%20.mp3</Location> <FileFolderCount>-1</FileFolderCount> <LibraryFolderCount>-1</LibraryFolderCount> </song>
After implementing the author’s code, I found that querying it was not very elegant. I don’t particularly like having to access XElement.Value all the time to get to the attributes of the song. It is also not strongly typed, so I decided to come up with a simple class for a song that has properties for the attributes I care about:
public class Song
{
// Tag (in iTunes XML):
public int Id { get; set; } // TrackID
public string Album { get; set; } // Name
public string Artist { get; set; } // Album
public int BitRate { get; set; } // BitRate
public string Comments { get; set; } // Comments
public string Composer { get; set; } // Composer
public string Genre { get; set; } // Genre
public string Kind { get; set; } // Kind
public string Location { get; set; } // Location
public string Name { get; set; } // Name
public int PlayCount { get; set; } // PlayCount
public int SampleRate { get; set; } // SampleRate
public Int64 Size { get; set; } // Size
public Int64 TotalTime { get; set; } // TotalTime
public int TrackNumber { get; set; } // TrackNumber
}
Now all I needed to do is write some simple code that returned IEnumerable<Song> to make things very flexible. Here’s what I came up with (implementing a simple console app that lists the top 10 most played files):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
namespace iTunesLibParser
{
class Program
{
static void Main(string[] args)
{
var mySongs = from s in LoadSongsFromITunes(@"C:\data\iTunes Music Library.xml")
orderby s.PlayCount descending
select s;
Console.WriteLine(" Top 10 Most Played Songs:");
foreach (var s in mySongs.Take(10))
{
Console.WriteLine(" {0} {1} ({2})", s.PlayCount, s.Name, s.Artist);
}
}
static IEnumerable<Song> LoadSongsFromITunes(string filename)
{
var rawsongs = from song in XDocument.Load(filename).Descendants("plist").Elements("dict").Elements("dict").Elements("dict")
select new XElement("song",
from key in song.Descendants("key")
select new XElement(((string)key).Replace(" ", ""),
(string)(XElement)key.NextNode));
var songs = from s in rawsongs
select
new Song()
{
Id = s.Element("TrackID").ToInt(0),
Album = s.Element("Album").ToString(string.Empty),
Artist = s.Element("Artist").ToString(string.Empty),
BitRate = s.Element("BitRate").ToInt(0),
Comments = s.Element("Comments").ToString(string.Empty),
Composer = s.Element("Composer").ToString(string.Empty),
Genre = s.Element("Genre").ToString(string.Empty),
Kind = s.Element("Kind").ToString(string.Empty),
Location = s.Element("Location").ToString(string.Empty),
Name = s.Element("Name").ToString(string.Empty),
PlayCount = s.Element("PlayCount").ToInt(0),
SampleRate = s.Element("SampleRate").ToInt(0),
Size = s.Element("Size").ToInt64(0),
TotalTime = s.Element("TotalTime").ToInt64(0),
TrackNumber = s.Element("TrackNumber").ToInt(0)
};
return songs;
}
}
public class Song
{
// Tag (in iTunes XML):
public int Id { get; set; } // TrackID
public string Album { get; set; } // Name
public string Artist { get; set; } // Album
public int BitRate { get; set; } // BitRate
public string Comments { get; set; } // Comments
public string Composer { get; set; } // Composer
public string Genre { get; set; } // Genre
public string Kind { get; set; } // Kind
public string Location { get; set; } // Location
public string Name { get; set; } // Name
public int PlayCount { get; set; } // PlayCount
public int SampleRate { get; set; } // SampleRate
public Int64 Size { get; set; } // Size
public Int64 TotalTime { get; set; } // TotalTime
public int TrackNumber { get; set; } // TrackNumber
}
public static class XExtensions
{
public static int ToInt(this XElement xe, int emptyValue)
{
return xe == null ? emptyValue : int.Parse(xe.Value);
}
public static Int64 ToInt64(this XElement xe, Int64 emptyValue)
{
return xe == null ? emptyValue : Int64.Parse(xe.Value);
}
public static string ToString(this XElement xe, string emptyValue)
{
return xe == null ? emptyValue : xe.Value;
}
}
}Now with the strongly typed collection of songs, you can do some interesting things using the LINQ query syntax. Here’s an example of reporting how many songs you have in your library for each genre:
var mySongs = from s in LoadSongsFromITunes(@"C:\data\iTunes Music Library.xml")
group s by s.Genre
into g
orderby g.Count() descending
select g;
foreach (var s in mySongs)
{
Console.WriteLine(" {0} {1} ", s.Count(), s.Key);
}
LINQ is a very powerful and useful technology and I’m sure I will keep finding more and more uses for it.
Now that I can easily iterate through my iTunes library, I need to get back to recovering my MP3′s…
September 26, 2008 at 9:42 am
Hi!
Very interesting post. LINQ is really powerful and useful technology, although not yet offer support for PostgreSQL or MySQL (but I will continue waiting “open source” community).
Another thing I drew attention in his post, the colorful boxes where you posted the source code. This is a third-part component that you added to your site or you are the author? I was really interested!
September 28, 2008 at 9:56 pm
LINQ is very cool and is very useful for any collection (that implements IEnumerable), not just SQL. Try it out.
As for source code formatting, I am using the Syntax Highlighter from Alex Gorbatchev, found at http://code.google.com/p/syntaxhighlighter/.
October 7, 2008 at 12:02 am
favorited this one, guy
February 12, 2009 at 3:04 pm
great post. i had the same situation where i was trying to recover some information from an old itunes library file and got bogged down using the xsd tool to generate a strongly typed plist class to deserialize my library. linq to xml made it so much easier.
April 9, 2009 at 3:25 pm
[...] I’ve seen LINQ to Active Directory. LINQ to iTunes (Well, sort of). LINQ to Google, LINQ to SharePoint, and now, there is even LINQ to [...]