problem with foreign letters in names apparently from crossref,
Mike Marchywka
marchywka at hotmail.com
Tue Feb 8 09:59:25 CET 2022
On Mon, Feb 07, 2022 at 10:38:55PM +0000, David Carlisle wrote:
> the c is c cedilla Unicode U+00E7 but it seems to have mangled all the names.
> Can you use the jats xml which has pretty much all the information you need in machine readable form
> [https://d197for5662m48.cloudfront.net/documents/publicationstatus/38127/preprint_jats/ec977818c5c0ae0420e3501ce604b48e.xml
It does not appear to be a problem with my existing html parser so I can get
it into the internal common format ignore the xslt stuff I guess.
I'm generally trying to find ways to include miscellaneous stuff in the
bibtex so that unexpected additions are included and an interested user can
modify them as needed. If I happen to find them I can modify the code
as cases come up.
I don't know if the descending c thing will go in the email ok but the parser
had not problem ,
1 html 3 body 4 article 15 front 22 article-meta 30 contrib-group 46 contrib 51 name 52 surname 53 text = Gonçalves
Now I just need to scrape jats links as with bibtex or doi info.
Thanks.
> ]https://d197for5662m48.cloudfront.net/documents/publicationstatus/38127/preprint_jats/ec977818c5c0ae0420e3501ce604b48e.xml
> google suggests several existing jats to bibtex convertors eg
> [https://github.com/PeerJ/jats-conversion/blob/master/src/data/xsl/jats-to-bibtex.xsl]https://github.com/PeerJ/jats-convers
> ion/blob/master/src/data/xsl/jats-to-bibtex.xsl
> David
>
> On Mon, 7 Feb 2022 at 22:13, Mike Marchywka <[mailto:marchywka at hotmail.com]marchywka at hotmail.com> wrote:
>
> I thought this was a simple URL to cite but it failed on Zotero webform,
> [https://www.authorea.com/doi/full/10.22541/au.159103680.00306295]https://www.authorea.com/doi/full/10.22541/au.15910368
> 0.00306295
> Outbreak of abomasal bloat in goat kids due to Clostridium ventriculi and Clostridium perfringens type A in Brazil
> BACTERIAL PATHOGENSDIAGNOSTICSDISEASE CONTROLTHERAPEUTIC
> +8
> Mário Felipe Balaro,Fernanda Gonçalves,Felipe Seabra Leal,Isabel Cosentino,Júlia Vignoli,Nathalia Silva,Felipe
> Brandão,Alessandra Figueiredo Nassar,Simone Miyashiro,Nathalie Cunha,Claudia Del Fava
> Further, the names are mangled as the one with a "c" including a wierd descender
> is hacked up even in the crossref output. I would blame by handling
> or non-ascii except that the json seems to split the "Gon" name at the offending
> character,
> {"status":"ok","message-type":"work","message-version":"1.0.0","message":{"institution":[{"name":"Authorea,
> Inc."}],"indexed":{"date-parts":[[2021,12,17]],"date-time":"2021-12-17T19:18:05Z","timestamp":1639768685631},"posted":{"
> date-parts":[[2020,6,1]]},"group-title":"Preprints","reference-count":0,"publisher":"Authorea,
> Inc.","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"accepted":{"date-parts":[
> [2020,6,1]]},"DOI":"10.22541\/au.159103680.00306295","type":"posted-content","created":{"date-parts":[[2020,6,1]],"date-
> time":"2020-06-01T18:40:05Z","timestamp":1591036805000},"source":"Crossref","is-referenced-by-count":0,"title":["Outbrea
> k of abomasal bloat in goat kids due to Clostridium ventriculi and Clostridium perfringens type A in
> Brazil"],"prefix":"10.22541","author":[{"given":"M rio
> Felipe","family":"Balaro","sequence":"first","affiliation":[{"name":"Universidade Federal
> Fluminense"}]},{"given":"Fernanda Gon","family":"alves","seq!
> uence":"additional","affiliation":[{"name":"Universidade Federal Fluminense"}]},
> Has anyone had a problem with crossref results and foreign chars?
> This is what I came up with, and you can see author_orig and "author" entries
> after I triedo to concat "DelFava" as I thought it should be- is that not
> right? Normally I ignore stuff like this as most users would but since
> I was cleaning it up I thought I would try to make it more conventional :)
> toobib handledoilink
> % date 2022-02-07:17:04:51 Mon Feb 7 17:04:51 EST 2022
> % srcurl:
> [https://www.authorea.com/doi/full/10.22541/au.159103680.00306295]https://www.authorea.com/doi/full/10.22541/au.15910368
> 0.00306295
> % citeurl:
> [http://api.crossref.org/works/10.22541/au.159103680.00306295]http://api.crossref.org/works/10.22541/au.159103680.003062
> 95
> @article{2020_rio_Felipe_Balaro_Fernanda_Gon_alves,
> X_TooBib = {year: ReWriteKvp dn=year sn=date flags=4},
> X_TooBib = {month: ReWriteKvp dn=month sn=date flags=7},
> X_TooBib = {day: ReWriteKvp dn=day sn=date flags=7},
> X_TooBib = {journal: ReWriteParse be.get(s)=Authorea, Inc. be.get(dest)=},
> X_TooBib = {urldate: FixBeKvp s= cmd=date "+%Y-%m-%d" d=2022-02-07 dn=urldate},
> X_TooBib = {author: FeliperioBalaro , M and Gonalves , Fernanda and Leal , Felipe Seabra and Cosentino , Isabel and
> liaVignoli , J and Silva , Nathalia and Felipe Brand and Nassar , Alessandra Figueiredo and Miyashiro , Simone and Cunha
> , Nathalie and DelFava , Claudia},
> affiliation = {Universidade Federal Fluminense and Universidade Federal Fluminense and Universidade Federal Fluminense
> and Universidade Federal Fluminense and Universidade Federal Fluminense and Universidade Federal Fluminense and
> Universidade Federal Fluminense and Instituto Biologico and Instituto Biologico and Universidade Federal Fluminense and
> Instituto Biologico},
> author = {FeliperioBalaro , M and Gonalves , Fernanda and Leal , Felipe Seabra and Cosentino , Isabel and liaVignoli , J
> and Silva , Nathalia and Felipe Brand and Nassar , Alessandra Figueiredo and Miyashiro , Simone and Cunha , Nathalie and
> DelFava , Claudia},
> author_orig = {M rio Felipe Balaro and Fernanda Gon alves and Felipe Seabra Leal and Isabel Cosentino and J lia Vignoli
> and Nathalia Silva and Felipe Brand o and Alessandra Figueiredo Nassar and Simone Miyashiro and Nathalie Cunha and
> Claudia Del Fava},
> bib-source = {Crossref},
> content-domain = {false},
> date = {2020-06-01},
> date-accepted = {2020-06-01},
> date-created = {2020-06-01T18:40:05Z},
> date-deposited = {2020-06-012020-06-01T18:40:05Z},
> date-indexed = {2021-12-17T19:18:05Z},
> date-issued = {2020-06-01},
> date-posted = {2020-06-01},
> day = {01},
> deposited = {1591036805000},
> doi = {10.22541/au.159103680.00306295},
> group-title = {Preprints},
> institution = {Authorea, Inc.},
> is-referenced-by-count = {0},
> journal = {Authorea, Inc.},
> member = {9829},
> month = {06},
> prefix = {10.22541},
> publisher = {Authorea, Inc.},
> reference-count = {0},
> references-count = {0},
> score = {1},
> subtype = {preprint},
> title = {Outbreak of abomasal bloat in goat kids due to Clostridium ventriculi and Clostridium perfringens type A in
> Brazil},
> type = {posted-content},
> url = {[http://dx.doi.org/10.22541/au.159103680.00306295]http://dx.doi.org/10.22541/au.159103680.00306295},
> urldate = {2022-02-07},
> year = {2020},
> srcurl={[https://www.authorea.com/doi/full/10.22541/au.159103680.00306295]https://www.authorea.com/doi/full/10.22541/au.
> 159103680.00306295},
> xsrcurl={[https://www.authorea.com/doi/full/10.22541/au.159103680.00306295]https://www.authorea.com/doi/full/10.22541/au
> .159103680.00306295},
> citeurl={[http://api.crossref.org/works/10.22541/au.159103680.00306295]http://api.crossref.org/works/10.22541/au.1591036
> 80.00306295}
> }
> --
> mike marchywka
> 306 charles cox
> canton GA 30115
> USA, Earth
> [mailto:marchywka at hotmail.com]marchywka at hotmail.com
> 404-788-1216
> ORCID: 0000-0001-9237-455X
--
mike marchywka
306 charles cox
canton GA 30115
USA, Earth
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X
More information about the texhax
mailing list.