Ce plugin est propose une série d'instructions pour traiter (aligner les affiliations avec le RNSR), requêter les documents de l'API Conditor.
npm install @ezs/core
npm install @ezs/conditor
$ ./bin/affAlign.js < data/1000-notices-conditor-hal.json | ./bin/compareRnsr.js
recall: 0.7104885057471264
correct: 989
total: 1392
Warning: to use the scripts, you need to install
@ezs/basics
too.
Les règles certaines utilisées par affAlign, appliquées à l'adresse de l'affiliation à aligner sont les suivantes:
- le
code_postal
ou laville_postale
de la structure doivent être présents, - et pour au moins une des tutelles (
etabAssoc.*.etab
, etetabAssoc.*.etab.natTutEtab
vautTUTE
):- soit
etabAssoc.*.etab.sigle
ou leetabAssoc.*.etab.libelle
sont présents, - soit
etabAssoc.*.etab.libelle
commence parUniversité
et leetabAssoc.*.etab.libelle
est présent (mais pas leetabAssoc.*.etab.sigle
).
- soit
- et on trouve la bonne structure:
- soit
etabAssoc.*.label
etetabAssoc.*.numero
sont présents proches et en séquence (ex:GDR2945
,GDR 2945
ouGDR mot 2945
), - soit
sigle
est présent, - soit
intitule
est présent.
- soit
- et la structure existait lors de la publication: une des
xPublicationDate
est entreannee_creation
et l'éventuellean_fermeture
.
Sachant qu'on appauvrit (casse, accents, tiret, apostrophe) tous les champs.
Find the RNSR identifiers in the authors affiliation addresses.
Input file:
[{
"xPublicationDate": ["2012-01-01", "2012-01-01"],
"authors": [{
"affiliations": [{
"address": "GDR 2989 Université Versailles Saint-Quentin-en-Yvelines, 63009"
}]
}]
}]
Script:
[use]
plugin = basics
plugin = conditor
[JSONParse]
[affAlign]
[JSONString]
indent = true
Output:
[{
"xPublicationDate": ["2012-01-01", "2012-01-01"],
"authors": [{
"affiliations": [{
"address": "GDR 2989 Université Versailles Saint-Quentin-en-Yvelines, 63009",
"conditorRnsr": ["200619958X"]
}]
}]
}]
year
number Year of the RNSR to use instead of the last one (optional, default2023
)
Take Conditor JSON documents and compute the recall of
authors.affiliations.conditorRnsr
in relation to
authors.affiliations.rnsr
.
Input
[{
"authors": [{
"affiliations": [{
"address": "GDR 2989 Université Versailles Saint-Quentin-en-Yvelines, 63009",
"rnsr": ["200619958X"],
"conditorRnsr": ["200619958X"]
}]
}]
}]
Output
{
"correct": 1,
"total": 1,
"recall": 1
}
Use scroll to return all results from Conditor API.
⚠️ you have to put a valid token into a.env
file, underCONDITOR_TOKEN
variable:
CONDITOR_TOKEN=eyJhbG...
q
string query (optional, default""
)scroll
string duration of the scroll (optional, default"5m"
)page_size
number size of the pages (optional, default1000
)max_page
number maximum number of pages (optional, default1000000
)includes
string fields to get in the responseexcludes
string fields to exclude from the responsesid
string User-agent identifier (optional, default"ezs-conditor"
)progress
boolean display a progress bar in stderr (optional, defaultfalse
)
Input
{
"q": "Test",
"page_size": 1,
"max_page": 1,
"includes": "sourceUid"
}
Output
[[
{
"sourceUid": "hal$hal-01412764",
"_score": 5.634469,
"_sort": [
0
]
}
]]
Take String
as URL, throw each chunk from the result
Input:
[
{ q: "toto" },
]
Script:
[CORHALFetch]
url = https://corhal-api.inist.fr
Output:
[{...}, {"a": "b"}, {"a": "c" }]
url
String? corhal api urltimeout
Number Timeout in milliseconds (optional, default1000
)retries
Number The maximum amount of times to retry the connection (optional, default5
)
Returns Object
Find the RNSR identifier(s) matching the address
and the publication year
of an article.
Get objects with an id
field and a value
field.
The value
field is an object containing address
and year
.
Returns an object with id
and value
fields. The value
is an array of
RNSR identifiers (if any).
Input:
[{
"id": 1,
"value": {
"address": "GDR 2989 Université Versailles Saint-Quentin-en-Yvelines, 63009",
"year": "2012"
}
}]
Output:
[{ "id": 1, "value": ["200619958X"] }]
year
number Year of the RNSR to use instead of the last one (optional, default2023
)
Find the RNSR information matching the address
and the publication year
of an article.
Get objects with an id
field and a value
field.
The value
field is an object containing address
and year
.
Returns an object with id
and value
fields. The value
is an array of
RNSR information objects (if any).
Input:
[{
"id": 1,
"value": {
"address": "Laboratoire des Sciences du Climat et de l'Environnement (LSCE), IPSL, CEA/CNRS/UVSQ Gif sur Yvette France",
"year": "2019"
}
}]
Output:
[{
"an_fermeture": "",
"annee_creation": "2014",
"code_postal": "75015",
"etabAssoc": [{
"etab": {
"libelle": "Centre national de la recherche scientifique",
"libelleAppauvri": "centre national de la recherche scientifique",
"sigle": "CNRS",
"sigleAppauvri": "cnrs"
},
"label": "UMR",
"labelAppauvri": "umr",
"numero": "8253"
}, {
"etab": {
"libelle": "Institut national de la sante et de la recherche medicale",
"libelleAppauvri": "institut national de la sante et de la recherche medicale",
"sigle": "INSERM",
"sigleAppauvri": "inserm"
},
"label": "U",
"labelAppauvri": "u",
"numero": "1151"
}, {
"etab": {
"libelle": "Université Paris Cité",
"libelleAppauvri": "universite paris cite",
"sigle": "U PARIS Cité",
"sigleAppauvri": "u paris cite"
},
"label": "UM",
"labelAppauvri": "um",
"numero": "111"
}],
"intitule": "Institut Necker Enfants Malades - Centre de médecine moléculaire",
"intituleAppauvri": "institut necker enfants malades centre de medecine moleculaire",
"num_nat_struct": "201420755D",
"sigle": "INEM",
"sigleAppauvri": "inem",
"ville_postale": "PARIS",
"ville_postale_appauvrie": "paris"
}]
year
number Year of the RNSR to use instead of the last one (optional, default2023
)
Take Object
with OpenAlx API parametrs, throw each chunk from the result
Input:
[
{ filter: "authorships.author.id:a5000387389" },
]
Script:
[OAFetch]
Output:
```json
[{...}, {"a": "b"}, {"a": "c" }]
timeout
Number Timeout in milliseconds (optional, default1000
)retries
Number The maximum amount of times to retry the connection (optional, default5
)
Returns Object
Take String
as URL, throw each chunk from the result
Input:
[
{ q: "toto" },
]
Script:
[WOSFetch]
token = SDQedaeaazedsqsd
Output:
[{...}, {"a": "b"}, {"a": "c" }]
url
String corhal api url (optional, defaulthttps://wos-api.clarivate.com/api/wos
)token
String? WOS API TOKENtimeout
Number Timeout in milliseconds (optional, default1000
)retries
Number The maximum amount of times to retry the connection (optional, default5
)
Returns Object