netcurmudgeon (netcurmudgeon) wrote,
netcurmudgeon
netcurmudgeon

  • Mood:
  • Music:

Filing

mapmakr asked me some questions about file naming and long-term stability. He had had some problems with files that had both long file names (greater than 8.3) and "non-standard characters (@ & # " and such). Here are my two-or-three cents on creating file names for maximum durability.

  • Go ahead and use long file names: they're just too damn useful. But don't go nuts. Keep them sane -- something like 32 to 64 characters max. You can run into odd limitations of directory structure depth and file name length where Windows will show you the file but will throw up its hands if you ask it to delete the file (observed on both 2000 and XP with NTFS volumes).

  • Make all file names all lower case. UNIX & Linux are strictly caseful. Window's case-sensitivity is iffy -- sometimes it is and sometimes it isn't. The safest, simplest, and most readable thing to do is to use all lower case characters.

  • No spaces! That's why the ASCII gave you the underscore! ( _ ). IMO, file names with spaces in them are more prone to name-related problems than 'space free' file names. Spaces are also anathema to easy handling of files in scripts and through CLI tools. Yes, even UNIX is getting better in terms of coping with directory and file names with spaces in them, but just don't do it.

  • No punctuation characters other than the dot. Period. Pun intended.


And now we come to the "for the love of Dog!" section. All of these deal with creating file names that will sort properly. Remember, computers are incredibly fast, and incredibly simple minded. We must do everything in our power to help them do what we want them to do.

  • For the love of Dog, if you're going to include a serial number in your file name, pad it with zeros! For example: file_001.tiff vs. file_1.tif Think of how many files you could possibly have in this series. Then double it. Then round up to the next power of ten. If that's 400 files, that becomes 800, which rounds to 1,000. So you should start with 0001. That way your files will sort out as 0001, 0002, 0010, 0152, 0407 etc and not 1, 2, 25, 27, 3, 34, 302...

  • Also for the love of all that is holy and good use big-endian dates!. This means you write today's date as 2007-11-19. The fourth of July was 2007-07-04 and New Years was 2007-01-01. Christmas will be 2007-12-25. This is the more human-readable version. If you want to be compact about it you can skip the dashes. Boxing Day would thus become 20071226. Putting the month first (eg: 11-19-07) means that all of the Novembers from all of the years in your file archive will be sorted in together. This is a complete pain in the ass, and not just for scripts but for other humans too. Put the year first, then the month, then the day.


And that, as they say, is pretty much that. If you follow these conventions your files will sort properly and you won't get into any bizarre twists if you are working cross-platform. FWIW, remember that this advice is free. :-)
Tags: files, for the love of dog, geeking, naming of parts
Subscribe

  • Saved by the Dell

    In the past couple of years Dell made sealed keyboards standard on the Latitude line. This makes them very spill resistant, as I discovered last…

  • Man, I just don't like bare grounds...

    The great generator project marches on. Wednesday afternoon I ran the first real test of the generator -- plugged into the house, transfer switch…

  • Progress, progress...

    Our Labor Day weekend near-miss with hurricane Earl has finally got me moving on something that has been on my to-do list ever since we bought the…

  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 1 comment