[tex-live] Problems with non-7bit characters in filename
Reinhard Kotucha
reinhard.kotucha at web.de
Fri Jul 4 03:17:29 CEST 2014
On 2014-07-04 at 00:08:42 +0100, Klaus Ethgen wrote:
> Am Do den 3. Jul 2014 um 23:46 schrieb Zdenek Wagner:
> > > I was pointed to this list to report the following Bug. Please put me in
> [Bug in filesystem code]
> > Lualatex is right, umlaut characters in latin1 are invalid sequences
>
> Thats true. While latin1 can include every possible character, UTF-8
> cannot. (possible as possible to have on the wire)
You misunderstood. The opposite is true. UTF-8 (Unicode) supports
all characters, Latin1 is a simple 8-bit encoding which supports only
Western European languages (except French).
UTF-8 is the encoding of the future because it supports all languages
used today. This is the reason why XeTeX and LuaTeX exist at all.
When I took over maintenance of VnTeX my OS still used Latin1. It was
a pain! I then switched to UTF-8 and everything worked fine.
I must admit that it was easy to do the change in my case because I
avoided non-ASCII characters in file-names in the past. Nowadys (with
UTF-8) I don't hesitate to use Russian or Korean characters in file or
directory names at all.
IMO all these national ISO-2022/ISO-8859 encodings are archaic. The
future is UTF-8.
> > in utf-8 but both luatex and xetex work internally in unicode. I
> > am not sure whether it is possible to change interaction with
> > file system encoding easily.
>
> Why converting the filename at all? The file name is the same on
> command line and on the file system. So without any reencoding
> everything would be fine.
It's not always the case. A German Windows is using CP1252 on the
command line and UTF-16 internally for file names. It's a pain.
> > Anyway, many years ago whe I did not use utf-8 in Linux, such file
> > name did not work even in OpenOffice.
Yes, AFAIK OpenOffice gratefully supports UTF-8. You should have
configured your file system to use UTF-8. This is the default for
years (on Linux and OS/X, at least). Windows always lags 20..30 years
behind and still insists on CP1252 (CP850 on the command line) for
German and similar idiocies for other languages.
> I never had that problems with latin1 (except with only few software
> like luatex). But I had many problems in past with trying to use UTF-8.
> However, that personal stuff is good to know but does not help in this
> situation.
But now you have problems with Latin1. The reason is that you still
insist on archaic encodings like Latin1 while the rest of the world is
striving towards Unicode. I strongly recommend to switch to UTF-8
completely. If you're on Linux or OS/X, simply stick with the defaults.
> Fact is that even software that uses UTF-8 (or other unicode) internal,
> work well in my environment. (Examples: Libreoffice, Gimp, Geeqie, ...
> (Geeqie, I am one of the people working on it)) So it must be possible
> to do that in lualatex or xetex too.
I don't know what you want to achieve. You said:
> While latin1 can include every possible character, UTF-8 cannot.
This is definitely wrong. The opposite is true.
http://www.unicode.org/standard/WhatIsUnicode.htm
Regards,
Reinhard
--
------------------------------------------------------------------
Reinhard Kotucha Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover mailto:reinhard.kotucha at web.de
------------------------------------------------------------------
More information about the tex-live
mailing list