pdftotext removing "fi" from a recent pdf I made with latex,
Tomas Rokicki
rokicki at gmail.com
Sun Nov 24 11:50:59 CET 2019
I believe this is largely a poppler problem. I'd be happy to discuss it a
bit more if you would like.
-tom
On Sun, Nov 24, 2019 at 2:47 AM Mike Marchywka <marchywka at hotmail.com>
wrote:
> On Sun, Nov 24, 2019 at 12:11:07AM +0000, Mike Marchywka wrote:
> >
> > I have never seen this before but looks like a stupid font problem
> > but it likely to be common with many pdf's now. If I just run
> > "pdftotext" on my output, I get weird boxes where each "fi"
> > is. If I used "-enc ASCII7" the entire thing is deleted.
> >
> > I could probably create a minimal working example but thought someone
> > may know offhand. Thanks.
>
> Nevermind, I figured it out :) I added this stupid thing
>
> \usepackage[T1]{fontenc}
>
> to fix another problem although if you are finding pdftotext output
> is jumbled or want to use the pdf ( and maybe dvi ) format
> to obscure information that would be in a normal text file ,
> this seems to work,
>
>
>
> \documentclass{article}
> \usepackage[T1]{fontenc}
> \usepackage{hyperref}
> \hypersetup{
> pdfinfo={
> x-bib-author = {A. Writer},
> x-bib-journal = {Test}
> x-bib-buy-url = {https://buyexpensivejunk}
> }
> }
>
> \newcommand{\addbib}[2]
> {
> \hypersetup{
> pdfinfo={ x-bib-#1 = {#2} } }
>
> }
> \addbib{author}{marchywka}
> \addbib{title}{my title}
> \addbib{omething}{foobar abstratct asdfasdfa }
>
> \begin{document}
> test
> a word that defines the problem, d e f i n e s
> \end{document}
>
>
> Compiling to pdf and inverting gives this,
>
> cat schumann.pdf | pdftotext - -
> test a word that de nes the problem, d e f i n e s
>
> 1
>
>
>
>
> >
> > This is the version,
> >
> > pdftotext -v
> > pdftotext version 0.41.0
> > Copyright 2005-2016 The Poppler Developers -
> http://poppler.freedesktop.org
> > Copyright 1996-2011 Glyph & Cog, LLC
> >
> > and basic info on the pdf file,
> > exifutil -list vitaprop.pdfExifTool Version Number : 11.75
> > File Name : vitaprop.pdf
> > Directory : .
> > File Size : 287 kB
> > File Modification Date/Time : 2019:11:23 06:17:53-05:00
> > File Access Date/Time : 2019:11:23 06:17:53-05:00
> > File Inode Change Date/Time : 2019:11:23 06:17:53-05:00
> > File Permissions : rw-rw-r--
> > File Type : PDF
> > File Type Extension : pdf
> > MIME Type : application/pdf
> > PDF Version : 1.5
> > Linearized : No
> > Page Count : 12
> > Page Mode : UseOutlines
> > Author :
> > Title :
> > Subject :
> > Creator : LaTeX with hyperref package
> > Producer : pdfTeX-1.40.16
> > Create Date : 2019:11:23 06:17:52-05:00
> > Modify Date : 2019:11:23 06:17:52-05:00
> > Trapped : False
> > PTEX Fullbanner : This is pdfTeX, Version
> 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) kpathsea version 6.2.1
> >
> >
> > --
> >
> > mike marchywka
> > 306 charles cox
> > canton GA 30115
> > USA, Earth
> > marchywka at hotmail.com
> > 404-788-1216
> > ORCID: 0000-0001-9237-455X
> >
>
> --
>
> mike marchywka
> 306 charles cox
> canton GA 30115
> USA, Earth
> marchywka at hotmail.com
> 404-788-1216
> ORCID: 0000-0001-9237-455X
>
>
--
-- http://cube20.org/ -- http://golly.sf.net/ --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20191124/ea4cf1fe/attachment.html>
More information about the texhax
mailing list