pdftotext removing "fi" from a recent pdf I made with latex,
Mike Marchywka
marchywka at hotmail.com
Sun Nov 24 11:47:21 CET 2019
On Sun, Nov 24, 2019 at 12:11:07AM +0000, Mike Marchywka wrote:
>
> I have never seen this before but looks like a stupid font problem
> but it likely to be common with many pdf's now. If I just run
> "pdftotext" on my output, I get weird boxes where each "fi"
> is. If I used "-enc ASCII7" the entire thing is deleted.
>
> I could probably create a minimal working example but thought someone
> may know offhand. Thanks.
Nevermind, I figured it out :) I added this stupid thing
\usepackage[T1]{fontenc}
to fix another problem although if you are finding pdftotext output
is jumbled or want to use the pdf ( and maybe dvi ) format
to obscure information that would be in a normal text file ,
this seems to work,
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{hyperref}
\hypersetup{
pdfinfo={
x-bib-author = {A. Writer},
x-bib-journal = {Test}
x-bib-buy-url = {https://buyexpensivejunk}
}
}
\newcommand{\addbib}[2]
{
\hypersetup{
pdfinfo={ x-bib-#1 = {#2} } }
}
\addbib{author}{marchywka}
\addbib{title}{my title}
\addbib{omething}{foobar abstratct asdfasdfa }
\begin{document}
test
a word that defines the problem, d e f i n e s
\end{document}
Compiling to pdf and inverting gives this,
cat schumann.pdf | pdftotext - -
test a word that denes the problem, d e f i n e s
1
>
> This is the version,
>
> pdftotext -v
> pdftotext version 0.41.0
> Copyright 2005-2016 The Poppler Developers - http://poppler.freedesktop.org
> Copyright 1996-2011 Glyph & Cog, LLC
>
> and basic info on the pdf file,
> exifutil -list vitaprop.pdfExifTool Version Number : 11.75
> File Name : vitaprop.pdf
> Directory : .
> File Size : 287 kB
> File Modification Date/Time : 2019:11:23 06:17:53-05:00
> File Access Date/Time : 2019:11:23 06:17:53-05:00
> File Inode Change Date/Time : 2019:11:23 06:17:53-05:00
> File Permissions : rw-rw-r--
> File Type : PDF
> File Type Extension : pdf
> MIME Type : application/pdf
> PDF Version : 1.5
> Linearized : No
> Page Count : 12
> Page Mode : UseOutlines
> Author :
> Title :
> Subject :
> Creator : LaTeX with hyperref package
> Producer : pdfTeX-1.40.16
> Create Date : 2019:11:23 06:17:52-05:00
> Modify Date : 2019:11:23 06:17:52-05:00
> Trapped : False
> PTEX Fullbanner : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) kpathsea version 6.2.1
>
>
> --
>
> mike marchywka
> 306 charles cox
> canton GA 30115
> USA, Earth
> marchywka at hotmail.com
> 404-788-1216
> ORCID: 0000-0001-9237-455X
>
--
mike marchywka
306 charles cox
canton GA 30115
USA, Earth
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X
More information about the texhax
mailing list