[Novalug] [OT] File and Database Questions
Jon LaBadie
novalugml@jgcomp.com
Sun Dec 13 18:47:09 EST 2009
On Sun, Dec 13, 2009 at 04:15:10PM -0500, Bonnie Dalzell wrote:
>
> Pardon me if this sounds like a dumb inquiry but here goes to elicit some
> discussion:
>
> I have this large dataset (54000 dogs) of almost all ascii formatted
> information about dogs which is the basis for the pedigree program I am
> working on.
>
> A given dog record varies from 190 bytes to 400 bytes. most of
> them are the smaller size.
Allowing an ave size of 300 bytes, that is only 16 MB.
>
> I have been trying to decide the best way to store this infomation in
> a way that will minimize getting records confused and make editing
> infortmation within the record simple.
Is the problem that you are using a text editor to enter and modify
the individual records?
>
> I have a program I have written which can make display pedigrees from
> the records.
>
> So I just did a little experiment and saved a set of 10 records each
> to its own uniquely named file. The set of separate files seems to
> occupy the same amount of disk space as the combined file of the 10
> records.
>
> Given the ability in linux to set up subdirectories in the manner
> they do on cspan of w/wi/william why not keep my individual records in
> this manner rather than all mashed together in one giant file.
I'll play and offer some considerations.
With a single file, probably after the initial opening and a little
bit of access, the entire file will be cached in memory and further
access will not involve the disk.
Accessing a single entry when its filename is discernable will
probably be quick in the multi-file dataset.
Searching for an entry based on the record's content will involve
an average of over 25000 thousand file and directory openings and
lots of disk access.
--
Jon H. LaBadie jon@jgcomp.com
JG Computing
12027 Creekbend Drive (703) 787-0884
Reston, VA 20194 (703) 787-0922 (fax)
More information about the Novalug
mailing list