bibtex from webpage [was Re: Tex utilities - related to ampersand cr error thread]
Mike Marchywka
marchywka at hotmail.com
Mon Jul 8 11:04:51 CEST 2019
On Mon, Jul 08, 2019 at 04:15:15AM +0000, Schneider, Thomas (NIH/NCI) [E] wrote:
> Mike:
>
> > just trying to extract bibtex from webpages is a huge task in
> > itself...
>
> I've pretty much solved this for biology. See my yvp page:
>
> https://alum.mit.edu/www/toms/yvp.html
>
> It turns out that the year, volume and page are often sufficient to
> identify a unique paper in PubMed. (If not, one can tack on an extra
> key word or select from the list provided by PubMed) The yvp script
> does this.
For this case, I guess I would just use eutils instead of opening a browser.
https://www.ncbi.nlm.nih.gov/books/NBK25500/
and create bibtex from that. However, usualy the situation is I'm browsing
and find an interesting article at a publisher's site. Normally then you
have to find a link to the bibtex and do some stuff to obtain it.
I ended up writing a script that takes a URL from clipboard and tries to
find bibtex for the article the link describes through either
a cite-specific derived link, scraping the page for bibtex, extracting
a DOI and using crossref etc. For pubmed, I extract the PMC or PMID and
get a result in medline format which can be parsed into bibtex.
Many publishers appear to use a few canned solutions so it is easy
to find the link on their pages but sometimes I have to scrape pdf files
for a doi and can get up the wrong article ... The script
I now have seems pretty good at getting bibtex but I do
need to check results and it seems awfully complicated- I probably
have 100 domains with special handlers and no assurance the webpages
are stable.
I guess there are browser plugins or citation managers for this
but a script that moves away from browser integrated better.
--
mike marchywka
306 charles cox
canton GA 30115
USA, Earth
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X
More information about the texhax
mailing list