anyone used headless browsers for scraping bibtex from webpages ?
John Scott
jscott at posteo.net
Wed May 20 23:51:00 CEST 2020
I don't know about specifically for BibTeX, but for web scripting or doing
basic forms cURL is pretty handy. For activating elements on a web page,
you'll probably want to look at saving/using cookies with --cookie-jar and --
cookie, and how to send POST requests.
For example I recently wrote a script to allow me to do a form and complete a
CAPTCHA all from the CLI. So I did
curl --cookie-jar jar.txt http://foo.com/do.php
to get it to save the cookie for my session. Then I'd recycle this cookie to
get my CAPTCHA:
curl --cookie jar.txt -o image.png http://foo.com/captcha.php
and lastly after reading it, send the request (figure out the field names from
Inspect Element in browser)
curl --cookie jar.txt -X POST -F 'captcha_code=FfFfFf' http://foo.com/
do.php
For help with particular sites, please feel free to share details on or off-
list.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: This is a digitally signed message part.
URL: <https://tug.org/pipermail/texhax/attachments/20200520/ee176ae0/attachment.sig>
More information about the texhax
mailing list.