Writing this id3v2 library has turned out to be just as hard as I thought it would be. I'm still in the experimentation stage at the moment, trying to figure out what the code can do for me. There's no unit testing outside of using real mp3 files to verify everything reads properly. You could say that I'm prototyping while learning the domain.
One big problem is starting the library as a read library. This skews everything over to one side, when I want to be able to write as well. A useful exercise may be to start it as a write library instead, and see how much is the same.
The other issue is encapsulation; hiding details not only to make the API simpler, but also to provide fewer entry points for library users. This lets me change more of the guts of the API without breaking the contract. Library users like that.
Java packages are useful for encapsulation. Java classes can be either public or package visible. Package visible classes can't be seen outside of the package they are in (or the library for that matter), so they're good for encapsulating functionality. I only want to expose certain classes to the user of the library by making them public. The others will be hidden from the user.
When I get a good enough first run at it done I'll post some details.
Posted at August 05, 2004 at 04:01 PM EST
Last updated August 05, 2004 at 04:01 PM EST
Remember when I told you about how winamp's id3v2 re-writes the mp3 altogether when you edit the id3v2 tag. This is grossly inefficient. Perhaps a split and join approach is to looked at instead of the re-write from scratch.
This was a nightmare when updating the tags for my 200+ MB files.
WinAmp rewrites the whole file or just the tag? Here's the thing: the id3v2 tag is at the start of the file, and there's only so much space reserved there by the previous tag.
If the new tag is larger than the previous one then you have to rewrite the whole file out again. That said, you can pad a id3v2 tag with zeroed bytes so that there's some extra space for tag rewrites.
Then the whole file doesn't have to be written out again because there's probably enough space for the new tag.
If I recall correctly, you can't "split and join" files like that. It may look like you are doing that in Unix, but in reality you're just writing the whole file out again.
"If the new tag is larger than the previous one then you have to rewrite the whole file out again."
Correct. So if the id3v2 tag is empty, and you go to write to it for the first time, it will rewrite the entire file. I just don't think this is necessary. There has to be a way around this.
It has to do with the way that operating systems manage files. To be more specific, it has to do with file systems but operating systems make the same assumptions about files. To be even more specific is has to do with how hard drives allocate bytes ... but I digress.
A file is a starting description node pointing to a chunk of data. Chunk size varies depending on the file system settings. If there's more information in the file than one chunk's worth, the chunk points to the next chunk in sequence like a linked list.
This goes on and on until all of the file is in linked chunks. The last chunk may be partially filled but intermediate chunks cannot be partially filled. The space not used in the last chunk is not available to other files and is "wasted".
This is why you can't connect arbitrary pieces of files together -- you have to recreate a new file from the beginning and redo all of the chunks so that they are all filled.
I forgot to add this part:
In the case of the padded tag, if there's enough room in the padded tag for the new tag the chunks do not have to be redone. You can just write over the data in any chunk because you're not underfilling any intermediate chunks. Any bytes you don't write over in an intermediate chunk will remain and the chunks will always be "full".
Of course when you use a programming language the file appears as one long file of bytes You could be spanning many chunks when you overwrite and that's OK. It's all transparent at the file system level.
Incidentally, this is why it's easy to recover files even after they've been deleted -- well, if you're in the hard disc recovery business.
When you delete a file, all you're deleting is that description node that points to the first chunk. The chunks in the file are put in a general pool of chunks available for new files, but they are not erased or unlinked!
So if you can find the first chunk of a file you can attach it to a new description node and you have your file back -- as long as none of the chunks have been unlinked and used in other files already.
The trick is probably finding that first chunk.
There are programs that will "zero" the bytes of deleted files out for you, so that they may never be recovered. This kind of stuff is used in highly sensitive top/trade secret applications and such. It's even starting to be used in general corporate applications, because it's cheap and effective.
But one "zero"-ing out of the bytes is often not enough! When you zero a bit (1 or 0; there are 8 bits in a byte), a trace of which direction the bit was still remains on the hard drive. Someone could go back and read those slight directions with more sensitive equipment and recover your data! Oops!
Really good zeroing software will do many zeros and ones to ensure the data is really gone. In fact, the US DoD has a standard number of zero and one passes that need to be done before hardware can be recycled!
But instead sometimes DoD hardware is just purposely destroyed -- it's less risky than possibly allowing data to walk out the door as a recycled piece of hardware.
Sorry for the blahblahblah, it's a really interesting topic. :D
Okay, a question for you:
From what I have been told before, when you delete something in *nix, it is GONE, never to be recovered except from backup. How did they do things differently from Windows?
That depends how you define "recovery". Sure, you can't search the file system for it because it's gone from that abstraction level.
But if you have the right low-level system calls you can search the chunks, make a new file header node and attach it to the first "chunk".
Unless the file system is designed to zero bytes when you delete files (which would be safe but very inefficient) then the file is recoverable.
I could be wrong but that's my understanding of file systems.
Jim, check out this page:
under the heading "Recovering UNIX/Linux File Systems". What you may have heard is that inodes are permanently deleted: "Most of the UNIX file system variants also permanently remove inode entries when data is deleted." So you lose stuff like the filename, size, timestamps.
An inode is the "file description node" I was talking about above. When you delete a file, the inode is probably deleted depending on the file system.
The rest of the file isn't for efficiency reasons -- if you had to actually erase all of the bytes in a file when you deleted it the task would take much longer.
To test this out on Unix, make sure you use the root account, go to / and type:
rm -rf *
That didn't take long, now did it?
PS> Don't do that, I'm kidding.
I worked with a guy who once made a script that deleted recursively a dir where an app was installed. But the path to the install dir was variable.
So the script was something like:
rm -rf $MY_DIR/
(running at root of course)
but he had a "bug" in the script that the var $MY_DIR was set to null, so he wipped out his computer. He was puzzeled what went wrong, so he went over to a co-workers box and ran the script again. Too funny...