pdftotext removing "fi" from a recent pdf I made with latex,

Mike Marchywka marchywka at hotmail.com
Sun Nov 24 11:47:21 CET 2019


On Sun, Nov 24, 2019 at 12:11:07AM +0000, Mike Marchywka wrote:
> 
> I have never seen this before but looks like a stupid font problem
> but it likely to be common with many pdf's now. If I just run 
> "pdftotext" on my output, I get weird boxes where each "fi"
> is. If I used "-enc ASCII7" the entire thing is deleted.
> 
> I could probably create a minimal working example but thought someone
> may know offhand. Thanks. 

Nevermind, I figured it out :) I added this stupid thing

\usepackage[T1]{fontenc}

 to fix another problem although if you are finding pdftotext output
is jumbled or want to use the pdf ( and maybe dvi )  format 
to obscure information that would be in a normal text file ,
this seems to work, 



 \documentclass{article}
\usepackage[T1]{fontenc}
 \usepackage{hyperref}
  \hypersetup{
   pdfinfo={
     x-bib-author  = {A. Writer},
      x-bib-journal = {Test}
	x-bib-buy-url = {https://buyexpensivejunk}
    }
 }

\newcommand{\addbib}[2]
{
  \hypersetup{
   pdfinfo={ x-bib-#1  = {#2} } }

}
\addbib{author}{marchywka}
\addbib{title}{my title}
\addbib{omething}{foobar abstratct asdfasdfa }

\begin{document}
test
a word that defines the problem, d e f i n e s
\end{document}


Compiling to pdf and inverting gives this,

cat schumann.pdf | pdftotext - - 
test a word that denes the problem, d e f i n e s

1




> 
> This is the version,
> 
> pdftotext -v
> pdftotext version 0.41.0
> Copyright 2005-2016 The Poppler Developers - http://poppler.freedesktop.org
> Copyright 1996-2011 Glyph & Cog, LLC
> 
> and basic info on the pdf file,
> exifutil -list vitaprop.pdfExifTool Version Number         : 11.75
> File Name                       : vitaprop.pdf
> Directory                       : .
> File Size                       : 287 kB
> File Modification Date/Time     : 2019:11:23 06:17:53-05:00
> File Access Date/Time           : 2019:11:23 06:17:53-05:00
> File Inode Change Date/Time     : 2019:11:23 06:17:53-05:00
> File Permissions                : rw-rw-r--
> File Type                       : PDF
> File Type Extension             : pdf
> MIME Type                       : application/pdf
> PDF Version                     : 1.5
> Linearized                      : No
> Page Count                      : 12
> Page Mode                       : UseOutlines
> Author                          : 
> Title                           : 
> Subject                         : 
> Creator                         : LaTeX with hyperref package
> Producer                        : pdfTeX-1.40.16
> Create Date                     : 2019:11:23 06:17:52-05:00
> Modify Date                     : 2019:11:23 06:17:52-05:00
> Trapped                         : False
> PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) kpathsea version 6.2.1
> 
> 
> -- 
> 
> mike marchywka
> 306 charles cox
> canton GA 30115
> USA, Earth 
> marchywka at hotmail.com
> 404-788-1216
> ORCID: 0000-0001-9237-455X
> 

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X



More information about the texhax mailing list