[Novalug] [OT] File and Database Questions

Jon LaBadie novalugml@jgcomp.com
Sun Dec 13 18:47:09 EST 2009


On Sun, Dec 13, 2009 at 04:15:10PM -0500, Bonnie Dalzell wrote:
> 
> Pardon me if this sounds like a dumb inquiry but here goes to elicit some 
> discussion:
> 
> I have this large dataset (54000 dogs) of almost all ascii formatted 
> information about dogs which is the basis for the pedigree program I am 
> working on.
> 
> A given dog record varies from 190 bytes to 400 bytes. most of 
> them are the smaller size.

Allowing an ave size of 300 bytes, that is only 16 MB.

> 
> I have been trying to decide the best way to store this infomation in 
> a way that will minimize getting records confused and make editing 
> infortmation within the record simple.

Is the problem that you are using a text editor to enter and modify
the individual records?

> 
> I have a program I have written which can make display pedigrees from 
> the records.
> 
> So I just did a little experiment and saved a set of 10 records each 
> to its own uniquely named file. The set of separate files seems to 
> occupy the same amount of disk space as the combined file of the 10 
> records.
> 
> Given the ability in linux to set up subdirectories in the manner 
> they do on cspan of w/wi/william why not keep my individual records in 
> this manner rather than all mashed together in one giant file.

I'll play and offer some considerations.

With a single file, probably after the initial opening and a little
bit of access, the entire file will be cached in memory and further
access will not involve the disk.

Accessing a single entry when its filename is discernable will
probably be quick in the multi-file dataset.

Searching for an entry based on the record's content will involve
an average of over 25000 thousand file and directory openings and
lots of disk access.


-- 
Jon H. LaBadie                  jon@jgcomp.com
 JG Computing
 12027 Creekbend Drive		(703) 787-0884
 Reston, VA  20194		(703) 787-0922 (fax)



More information about the Novalug mailing list