...where sanity comes to die.
Visit my blogBlur the lines between genius, insanity, and utter stupidity.WALDOLand Music CentralDevelopment WorkAbout MeContact MeWALDOLand Site Map
 
Apple ITC File Format
Peering into the structure of the Apple ITC file format used for album artwork images.
Author: WALDO
Publish Date: Nov 9, 2006
Revised: Jul 19, 2007
Categories: VB.Net, C#, iTunes

I've been developing some projects relating to iTunes lateley. iTunes 7.0.1 has a new feature called CoverFlow. It is the ability to view your music library by its album art. I thought this was phenomenal. In fact it almost has made a convert out of me, using iTunes almost exclusively now.

iTunes CoverFlow

Having the ability to see all of your album art at once is fantastic, but it also shows you how woefully incomplete your library is. Of course, me being the obsessive-compulsive perfectionist/completionist that I am, I had to find album artwork for EVERY song in my library.

I used the 'Get Album Artwork' function in iTunes to obtain my missing artwork. Everything was going swimmingly. iTunes found and downloaded high resolution album artwork for nearly every album I had in my library (over 8,500 songs). I was doing fine until I began playing those same songs in MusicMatch Jukebox and noticed that the artwork that iTunes had just downloaded was not showing up in MusicMatch.

Was this a coincidence? I had to be sure. I discovered that the 'Get Album Artwork' function in iTunes DOES NOT save the downloaded artwork into the actual MP3 files. Instead it creates .itc files in the folder

%USERPROFILE%\My Documents\My Music\iTunes\Album Artwork

The .itc files contain images and metadata for each album which has downloaded artwork. CoverFlow reads the files into memory and holds them there until the application quits, so that it can display album art images quickly and smoothly.

That's fine. Good for Apple. They have their own system for optimizing album artwork images. But what about poor little ole' me, who wants those images embedded in his MP3 files?

There have been a number of suggested ways to accomplish this thrown about the web. The easiest I can think of is to use the 'Get Info...' command in iTunes, switch to the Artwork tab, Cut the downloaded image from the viewer and re-paste the same image. This will embed the image in the actual MP3 file. This is effective, but also very tedious if you have a large number of files.

Another suggested way was to write a program which hacks the .itc files themselves. For some reason, this appealed to me.

Many places where I've found ways to carve up an .itc file suggest simply removing the first 492 bytes and the rest of the file is JPEG/PNG image data. That would be great if it worked consistently. What I've discovered on my own is that it does not work 100% of the time. Frequently I have found .itc files where the image data did not start until after the first 500 bytes, or other variations on that number.

Based on that inconsistency, I decided to inspect the format of an .itc file myself and see if I could infer a file specification myself. The .itc file seems to consist of four sections: a File Signature, a "Null Buffer", a Data Header, and Image Data.

File Signature

The fourth byte of the file would seem to be self describing, indicating the length of the entire File Signature. In the sample file below, the fourth byte has a value of 1C (28). The File Signature itself seems to have a fairly consistent structure, which has the sequence 69 74 63 68 (itch) beginning at index 4 and 61 72 74 77 (artw) beginning at index 24, terminating the File Signature.

"Null Buffer"

Following the File Signature is 256 bytes of 00 (null).

Data Header

The Data Header contains metadata about the file/artwork itself. So far, every .itc file I have inspected has had the fixed-length signature of 28 bytes, followed by the fixed-length null buffer of 256 bytes. Here is where the variable file size comes into play.

Just like the File Signature, the Data Header is self-describing. The length of the Data Header is a factor in determining where the actual image data begins. This is important because this is where the .itc files I have inspected may vary from the norm.

NEW!

Disposable information (4 bytes)
The first four bytes of the Data Header would seem to be disposable information for our purposes.

"item" sequence (4 bytes)
The next four bytes of the Data Header is the sequence 69 74 65 6D (item).

Data Header Length (4 bytes)
The next four bytes are an unsigned integer value indicating the overall length of the Data Header. In the sample file below, the Data Header length has a value of 00 00 00 D8 (216).

Disposable information (16 bytes)
Immediately following the Data Header length is 16 bytes of disposable information.

Disposable information (0-4 bytes)
When the value of the Data Header length is 212, the next section of metadata begins immediately. If it is 216, the next section is offset by an additional four bytes of disposable information.

Library Persistent ID (8 bytes)
The next 8-byte sequence is the iTunes Music Library Persistent ID to which this track belongs. The Library Persistent ID is a hexadecimal string converted from those bytes. In the example below, you can see the sequence D4 CC CA A6 22 F6 CD DC, which corresponds to my Library Persistent ID (which is the first part of the .itc file name).

Track Persistent ID (8 bytes)
The next 8-byte sequence is the Track Persistent ID if this track. Like the Library Persistent ID, Track Persistent ID is also a hexadecimal string converted from those bytes. In the example below, you can see the sequence 3D 82 AC 91 DD 2D 58 B0, which corresponds to the Track Persistent ID (which is the second part of the .itc file name). You can use the Library and Track Persistent IDs together to discover information about the track, using the iTunes Music Library.xml file.

Download/persistence indicator (4 bytes)
The next 4 bytes are either the string sequence 64 6F 77 6E (down) or 6C 6F 63 6C (locl), which when "down", indicates that the CoverFlow artwork was downloaded and not persisted in a music file's tag information. It also corresponds to the appropriate subfolder beneath the Album Artwork folder. The opposite is true of "locl".

Pseudo-File Format (4 bytes)
The next 4 bytes would seem to give a hint as to the format of the embedded image. When the four bytes equate to the string sequence 50 4E 47 66 (PNGf), the image format will be of PNG (portable network graphics) type. When the sequence is 00 00 00 0D, the image is a JPEG (joint photographics experts group) image.

Disposable information (4 bytes)
Four more bytes of disposable information.

Image Width (4 bytes)
The next four bytes are an unsigned integer value indicating the width of the embedded image.

Image Height (4 bytes)
The next four bytes are an unsigned integer value indicating the height of the embedded image.

Image Data

Once the size of the Data Header has been determined, the next block is the actual Image Data, starting immediately after the Data Header, and continuing to the end of the file. In the sample below, the next four bytes are the sequence FF D8 FF E0 (ÿØÿà) which, as some of you may know, are the signature for a JPEG image.

Apple ITC file strucure

So far, I have been able to consistently extract the image data from .itc files on my own machines. This has been very useful to me in collecting album artwork downloaded from iTunes without having to automate iTunes itself.

Since this is by no means official and an inference of the structure, it is entirely possible that you may find the structure to be different. I simply look for patterns in the chaos. If you find that this does not give you the ability to consistently extract album artwork then please let me know.