[Novalug] question about linux file names

Peter Larsen plarsen@famlarsen.homelinux.com
Mon Dec 14 16:39:55 EST 2009


On Sun, 2009-12-13 at 19:59 -0500, Bonnie Dalzell wrote:

> On Sun, 13 Dec 2009, Bryan J. Smith wrote:
> 
> > Ummm, they are typically one and the same.
> > UTF-8 encoding it typically the 7-bit ASCII set.  ;)
> >
> 
> the old dos low ascii set does not map one to one to linux. i have some 
> old data in 1990's type ascii and the umlaut characters, etc are not the 
> same as their representation under my linux system so if I try and load a 
> file with one of these names I can get errors trying to open and read the 
> files.



ASCII didn't have "umlaut" - at least not standard wise. It's the
problem with ASCII (American Standard Code for Information Interchange -
yeah, they were thinking big back then) is American only. As someone who
grew up outside the US, ASCII was a pain in the butt. Non US letters
were never standardlizes until DOS introduced codesets. Even then it was
implemented in a strange way that rarely functioned. One thing you could
count on with Ascii is that anything beyond the 25 US letters would have
a hard time rendering. When it came to printers, the letters even had
different "fonts" from the US letters, so the national characters looked
"odd", small and twisted compared to the US characters.


> i have written routines for my pedigree program to change the foreign low 
> ascii charaters into "english equivalent letters" for the file names 
> but I want to also experiment with going from the low ascii foreign 
> letters to utf encoding.


Modern computer languages can do that for you now. That's the good part
of using standardlized charactersets. Trivival jobs like that are now
standardlized. Next comes sorting which is even a bigger pain but that
too was solved with UTF-* and the ISO standards. The problem, as Brian
points out, is that UTF8 is a 16 bit (optional) character-set. A lot of
programmers still think they can count bytes to find the length of a
string (tsk tsk). 

Btw. filenames in Linux takes uppercase too - they're NOT the same as
lowercase letters, so file A1 is different from file a1 - this is why
Samba has a big problems mapping files between the windows and the linux
world.

-- 

Best Regards
  Peter Larsen

Wise words of the day:
Computers are useless.  They can only give you answers.
	-- Pablo Picasso
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.firemountain.net/pipermail/novalug/attachments/20091214/6a4a1878/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part
URL: <https://lists.firemountain.net/pipermail/novalug/attachments/20091214/6a4a1878/attachment.asc>


More information about the Novalug mailing list