Writing a de-psfagger
Mike Marchywka
marchywka at hotmail.com
Thu Jan 23 19:53:42 CET 2020
On Thu, Jan 23, 2020 at 06:21:27PM +0000, Jonathan Fine wrote:
> Hi Paulo
> I suggested use 'file' because it looks at the content, and ignores the extension. Users sometimes give their files odd
> extensions.
I picked up exiftool for dealing with things like scanning for non-standard bibtex
someone may have chosen to put in a pdf file :) This seem to produce more structured
output than "file" although I have not looked at all of its options.
https://exiftool.org/
Curious though if anyone wants to compare. There are specializations, for images
I used to use imagemagick "identify."
> For example, is chapter1.exe an exe file, or a collection of exercises.
> I think what you want first is a collection of quick tools that never (or almost never) can be trusted.
> A tool that never gives a false negative can be trusted when it gives a negative.
> Similarly, false positive and positive.
> Once you've quickly winnowed out most of the files to be excluded, the problem becomes smaller.
> I hope your code goes up on github (and that you choose Python over Perl, smile).
> best wishes
> Jonathan
--
mike marchywka
306 charles cox
canton GA 30115
USA, Earth
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X
More information about the texhax
mailing list