EUR-Lex Parser

An EUR-Lex parser for Python.

Usage

You can install this package as follows:

pip install -U eurlex

After installing this package, you can download and parse any document from EUR-Lex. For example, the 32019R0947 regulation:

from eurlex import get_html_by_celex_id, parse_html

# Retrieve and parse the document with CELEX ID "32019R0947" into a Pandas DataFrame
celex_id = "32019R0947"
html = get_html_by_celex_id(celex_id)
df = parse_html(html)

# Get the first line of Article 1
df_article_1 = df[df.article == "1"]
df_article_1_line_1 = df_article_1.iloc[0]

# Display the subtitle of Article 1
print(df_article_1_line_1.article_subtitle)
>>> "Subject matter"

# Display the corresponding text
print(df_article_1_line_1.text)
>>> "This Regulation lays down detailed provisions for the operation of unmanned aircraft systems as well as for personnel, including remote pilots and organisations involved in those operations."

Every document on EUR-Lex displays a CELEX number at the top of the page. More information on CELEX numbers can be found on the EUR-Lex website.

For more information about the methods in this package, see the unit tests and doctests.

Data Structure

The following columns are available in the parsed dataframe:

text: The text
type: The type of the data
document: The document in which the text is found
article: The article in which the text is found
article_subtitle: The subtitle of the article (when available)
ref: The indentation level of the text within the article (e.g. ["(1)", "(a)"] when the text is found under paragraph (1), subparagraph (a))

In some cases, additional fields are available. For example, the group field which contains the bold text under which a text is found.

Code Contribution

Feel free to send any issues, ideas or pull requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EUR-Lex Parser

Usage

Data Structure

Code Contribution

Files

README.md

Latest commit

History

README.md

File metadata and controls

EUR-Lex Parser

Usage

Data Structure

Code Contribution