netcurmudgeon (netcurmudgeon) wrote,
netcurmudgeon
netcurmudgeon

  • Mood:
  • Music:

Filing

mapmakr asked me some questions about file naming and long-term stability. He had had some problems with files that had both long file names (greater than 8.3) and "non-standard characters (@ & # " and such). Here are my two-or-three cents on creating file names for maximum durability.

  • Go ahead and use long file names: they're just too damn useful. But don't go nuts. Keep them sane -- something like 32 to 64 characters max. You can run into odd limitations of directory structure depth and file name length where Windows will show you the file but will throw up its hands if you ask it to delete the file (observed on both 2000 and XP with NTFS volumes).

  • Make all file names all lower case. UNIX & Linux are strictly caseful. Window's case-sensitivity is iffy -- sometimes it is and sometimes it isn't. The safest, simplest, and most readable thing to do is to use all lower case characters.

  • No spaces! That's why the ASCII gave you the underscore! ( _ ). IMO, file names with spaces in them are more prone to name-related problems than 'space free' file names. Spaces are also anathema to easy handling of files in scripts and through CLI tools. Yes, even UNIX is getting better in terms of coping with directory and file names with spaces in them, but just don't do it.

  • No punctuation characters other than the dot. Period. Pun intended.


And now we come to the "for the love of Dog!" section. All of these deal with creating file names that will sort properly. Remember, computers are incredibly fast, and incredibly simple minded. We must do everything in our power to help them do what we want them to do.

  • For the love of Dog, if you're going to include a serial number in your file name, pad it with zeros! For example: file_001.tiff vs. file_1.tif Think of how many files you could possibly have in this series. Then double it. Then round up to the next power of ten. If that's 400 files, that becomes 800, which rounds to 1,000. So you should start with 0001. That way your files will sort out as 0001, 0002, 0010, 0152, 0407 etc and not 1, 2, 25, 27, 3, 34, 302...

  • Also for the love of all that is holy and good use big-endian dates!. This means you write today's date as 2007-11-19. The fourth of July was 2007-07-04 and New Years was 2007-01-01. Christmas will be 2007-12-25. This is the more human-readable version. If you want to be compact about it you can skip the dashes. Boxing Day would thus become 20071226. Putting the month first (eg: 11-19-07) means that all of the Novembers from all of the years in your file archive will be sorted in together. This is a complete pain in the ass, and not just for scripts but for other humans too. Put the year first, then the month, then the day.


And that, as they say, is pretty much that. If you follow these conventions your files will sort properly and you won't get into any bizarre twists if you are working cross-platform. FWIW, remember that this advice is free. :-)
Tags: files, for the love of dog, geeking, naming of parts
Subscribe

  • What I did on my summer vacation

    After a week off, here's where it all went... Planned activities Clean the office Not even close Clean the garage Done!!…

  • So much easier when you have the right tools

    I am working on my Dad's 1987 John Deere tractor -- fixing lights that don't work. Yesterday was a bit of a goat-screw as I started my…

  • What I did on spring break

    The fact that I really do need to take some time off finally penetrated my thick skull a couple of weeks ago. So, I'm taking vacation this week.…

  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 1 comment