Skip to content

Latest commit

 

History

History

istex

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

istex

Ce plugin propose une série d'instructions liées à l’usage de l’API ISTEX

installation

npm install @ezs/istex

usage

Table of Contents

ISTEX

Take an array and returns matching documents for every value of the array

Parameters

  • query (string | Array<string>) ISTEX query (or queries) (optional, default data.query||[])
  • id (string | Array<string>) ISTEX id (or ids) (optional, default data.id||[])
  • maxPage number maximum number of pages to get
  • size number size of each page of results
  • duration string maximum duration between two requests (ex: "30s")
  • field Array<Object> fields to output

Examples

.pipe(ezs('ISTEX', {
  query: 'this is a test',
  size: 3,
  maxPage: 1,
  sid: 'test'
}))

Returns Array<Object>

ISTEXFacet

Take an object containing a query string field, a facet, and output aggregations from the ISTEX API.

Parameters

  • query string ISTEX query (optional, default "*")
  • facet string ISTEX facet (optional, default "corpusName")
  • sid string User-agent identifier (optional, default "ezs-istex")

Examples

from([{ query: 'ezs', facet: 'corpusName' }])
  .pipe(ezs('ISTEXFacet', { sid: 'test', }))

Returns Array<Object>

ISTEXFetch

Take Object with id and returns the document's metadata

Parameters

  • source string Field to use to fetch documents (optional, default "id")
  • target string
  • id string ISTEX Identifier of a document (optional, default data.id)
  • sid string User-agent identifier (optional, default "ezs-istex")

Examples

Input:

[{
  id: '87699D0C20258C18259DED2A5E63B9A50F3B3363',
}, {
  id: 'ark:/67375/QHD-T00H6VNF-0',
}]

will produce two JSON records.

.pipe(ezs('ISTEXFetch', { source: 'id' }))

Returns Array<Object>

ISTEXFiles

Take an Object with ISTEX id and generate an object for each file

See ISTEXScroll

Parameters

  • fulltext string typology of the document to save (optional, default pdf)
  • metadata string format of the files to save (optional, default json)
  • enrichment string? enrichment of the document to save
  • sid string User-agent identifier (optional, default "ezs-istex")

Returns Array

ISTEXFilesContent

Take an Object with ISTEX source and check the document's file. Warning: to access fulltext, you have to give a token parameter. ISTEXFetch produces the stream you need to save the file.

See ISTEXFiles

Parameters

Returns Object

ISTEXFilesWrap

Take and Object with ISTEX stream and wrap into a single zip

See ISTEXFiles

Returns Buffer

ISTEXParseDotCorpus

Parse a .corpus file content, and execute the action contained in the .corpus file.

1query.corpus
[ISTEX]
query = language.raw:rum
field = doi
field = author
field = title
field = language
field = publicationDate
field = keywords
field = host
field = fulltext
1notice.corpus
[ISTEX]
id 2FF3F5B1477986B9C617BB75CA3333DBEE99EB05

Returns Object

ISTEXResult

Take Object containing results of ISTEX API, and returns hits value (documents).

This should be placed after ISTEXScroll.

See ISTEXScroll

Parameters

  • source string (optional, default data)
  • target string (optional, default feed)

Returns Array<Object>

ISTEXSave

Take and Object with ISTEX id and save the document's file.

Warning: to access fulltext, you have to give a token parameter.

ISTEXFetch produces the stream you need to save the file.

See ISTEXFetch

Parameters

  • directory string path for the PDFs (optional, default currentworkingdirectory)
  • typology string typology of the document to save (optional, default "fulltext")
  • format string format of the files to save (optional, default "pdf")
  • sid string User-agent identifier (optional, default "ezs-istex")
  • token string? authentication token (see documentation)

Returns Array

ISTEXScroll

Take an object containing a query string field and output records from the ISTEX API. Every output record is merged with the input object.

Parameters

  • query string ISTEX query (optional, default input)
  • sid string User-agent identifier (optional, default "ezs-istex")
  • maxPage number Maximum number of pages to get
  • size number size of each page of results (optional, default 2000)
  • duration string maximum duration between two requests (optional, default "5m")
  • field Array<string> fields to get (optional, default ["doi"])

Examples

from([{ query: 'this is a test' }])
  .pipe(ezs('ISTEXScroll', {
      maxPage: 2,
      size: 1,
      sid: 'test',
  }))

Returns Array<Object>

ISTEXTriplify

Take Object containing flatten hits from ISTEXResult.

If the environment variable DEBUG is set, some errors could appear on stderr.

See

data:
{
  'author/0/name': 'Geoffrey Strickland',
  'author/0/affiliations/0': 'University of Reading',
  'host/issn/0': '0047-2441',
  'host/eissn/0': '1740-2379',
  'title': 'Maupassant, Zola, Jules Vallès and the Paris Commune of 1871',
  'publicationDate': '1983',
  'doi/0': '10.1177/004724418301305203',
  'id': 'F6CB7249E90BD96D5F7E3C4E80CC1C3FEE4FF483',
  'score': 1
}
javascript:
.pipe(ezs('ISTEXTriplify', {
   property: [
     'doi/0 -> http://purl.org/ontology/bibo/doi',
     'language -> http://purl.org/dc/terms/language',
     'author/\\d+/name -> http://purl.org/dc/terms/creator',
     'author/\\d+/affiliations -> https://data.istex.fr/ontology/istex#affiliation',
   ],
 ));
output:
 <https://data.istex.fr/document/F6CB7249E90BD96D5F7E3C4E80CC1C3FEE4FF483>
    a <http://purl.org/ontology/bibo/Document> ;
      "10.1002/zaac.19936190205" ;
    <https://data.istex.fr/ontology/istex#idIstex> "F6CB7249E90BD96D5F7E3C4E80CC1C3FEE4FF483" ;
    <http://purl.org/dc/terms/creator> "Geoffrey Strickland" ;
    <https://data.istex.fr/ontology/istex#affiliation> "University of Reading" ;

Parameters

  • property Object path to uri for the properties to output (property and uri separated by ->) (optional, default [])
  • source string the root of the keys (ex: istex/) (optional, default "")

Returns string

ISTEXUniq

Remove duplicates triples within a single document's set of triples (same subject).

Assume that every triple of a document (except the first one) follows another triple of the same document.

Input:
<https://api.istex.fr/ark:/67375/NVC-JMPZTKTT-R> <http://purl.org/dc/terms/creator> "S Corbett" .
<https://api.istex.fr/ark:/67375/NVC-JMPZTKTT-R> <https://data.istex.fr/ontology/istex#affiliation> "Department of Public Health, University of Sydney, Australia." .
<https://api.istex.fr/ark:/67375/NVC-JMPZTKTT-R> <https://data.istex.fr/ontology/istex#affiliation> "Department of Public Health, University of Sydney, Australia." .
<https://api.istex.fr/ark:/67375/NVC-JMPZTKTT-R> <https://data.istex.fr/ontology/istex#affiliation> "Department of Public Health, University of Sydney, Australia." .
Action in a `.ezs` script
[ISTEXUniq]
Output
<https://api.istex.fr/ark:/67375/NVC-JMPZTKTT-R> <http://purl.org/dc/terms/creator> "S Corbett" .
<https://api.istex.fr/ark:/67375/NVC-JMPZTKTT-R> <https://data.istex.fr/ontology/istex#affiliation> "Department of Public Health, University of Sydney, Australia." .

ISTEXUnzip

Take the content of a zip file, extract JSON files, and yield JSON objects.

The zip file comes from dl.istex.fr, and the manifest.json is not extracted.

Returns any Array