Writing a de-psfagger

Mike Marchywka marchywka at hotmail.com
Thu Jan 23 19:53:42 CET 2020


On Thu, Jan 23, 2020 at 06:21:27PM +0000, Jonathan Fine wrote:
>    Hi Paulo
>    I suggested use 'file' because it looks at the content, and ignores the extension. Users sometimes give their files odd
>    extensions.

I picked up exiftool for dealing with things like scanning for non-standard bibtex
someone may have chosen to put in a pdf file :) This seem to produce more structured
output than "file" although I have not looked at all of its options. 

https://exiftool.org/

Curious though if anyone wants to compare. There are specializations, for images
I used to use imagemagick "identify."  




>    For example, is chapter1.exe an exe file, or a collection of exercises.
>    I think what you want first is a collection of quick tools that never (or almost never) can be trusted.
>    A tool that never gives a false negative can be trusted when it gives a negative.
>    Similarly, false positive and positive.
>    Once you've quickly winnowed out most of the files to be excluded, the problem becomes smaller.
>    I hope your code goes up on github (and that you choose Python over Perl, smile).
>    best wishes
>    Jonathan

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X



More information about the texhax mailing list