[tex4ht] [bug #597] tex4ht + biblatex + non-ascii chars = mixed encoding in html file
Matteo Gamboz
puszcza-hackers at gnu.org.ua
Sat Mar 4 15:01:23 CET 2023
URL:
<http://puszcza.gnu.org.ua/bugs/?597>
Summary: tex4ht + biblatex + non-ascii chars = mixed encoding
in html file
Project: tex4ht
Submitted by: gamboz
Submitted on: Sat Mar 4 14:01:23 2023
Category: None
Priority: 5 - Normal
Severity: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Email:
Open/Closed: Open
Discussion Lock: Any
_______________________________________________________
Details:
NB: this is the same as https://tex.stackexchange.com/q/678200/56076
(sorry for cross-posting) :)
I have this situation
- a LaTeX file with a macro that is usually translated into a unicode char by
tex4ht (e.g. `\ldots` that became `…`)
- a citation with non-ascii char in the name of the author (e.g. the `í` in
`Albarracín`)
- I would like to generate an xhtml file with htlatex
The procedure works, but the resulting file has one char encoded in utf-8 (the
latex macro) and the non-ascii char in the author's name encoded in latin-1.
AFAICT, htlatex includes the bbl file reading it as if it was in latin-1.
Is there anything that I could do to fix this behavior? :)\
(I'm working on `pdfTeX, Version 3.141592653-2.6-1.40.24 (TeX Live 2022/Arch
Linux)`)
Here is a mwe, and below the commands that I run:
```latex
%% File mwe.tex
\documentclass{article}
\usepackage[backend=biber]{biblatex}
\begin{filecontents}{\jobname.bib}
@Article{Albarracin2000,
year = {2000},
volume = {1},
issue = {2},
pages = {3},
author = {Anyone Albarracín},
title = {A beautiful paper.},
journaltitle = {Some Journal}
}
\end{filecontents}
\addbibresource{\jobname.bib}
\begin{document}
I Am a Scientist\ldots\ Ask Me Anything
\parencite{Albarracin2000}
\printbibliography
\end{document}
```
```sh
htlatex mwe.tex "xhtml" "-cunihtf -utf8" "" ""
biber mwe
htlatex mwe.tex "xhtml" "-cunihtf -utf8" "" ""
```
and the result
```sh
$ file mwe.html
mwe.html: XML 1.0 document, Non-ISO extended-ASCII text
$ grep -a -e 'Anyone Albarra' -e Scientist --color mwe.html
<!--l. 22--><p class="noindent" >I Am a Scientist… Ask Me Anything [<a
<!--l. 26--><p class="noindent" >Anyone Albarrac�n. “A beautiful
paper.” In: <span
```
_______________________________________________________
Reply to this item at:
<http://puszcza.gnu.org.ua/bugs/?597>
_______________________________________________
Message sent via/by Puszcza
http://puszcza.gnu.org.ua/
More information about the tex4ht
mailing list.