BiblioFractal 📚🌀

Tools for parsing the DBLP database 🛠️📑

Welcome to the DBLP Data Processing Toolkit. This suite of scripts is expertly crafted to transform the rich data from the DBLP computer science bibliography into a user-friendly JSON format, prioritizing information pertinent to individual authors. Dive into a toolkit that cleans XML files, selectively extracts an author's publications, considers collaborator data upon request, and seamlessly converts the dataset into JSON.

Contents 📂

entities_remover.py: A utility to sanitize XML files by eliminating entities 🧹.
xml_author_extractor.py: A specialized extractor to sift through the XML file for data associated with a specified author 🔍.
xml_to_json_converter.py: A converter that transitions author-specific XML records into a structured JSON file 🔄.
main_dblp_script.py: The orchestrator script, handler of the transition from the dblp.xml to final DBs.

Getting Started 🚀

Prerequisites

Python 3.x 🐍
Terminal or Command-line interface 💻

Setup

Clone the repository on data processing:

git clone https://github.com/your-username/biblio-fractal.git
cd biblio-fractal

Ensure you have a copy of the dblp.xml in the input directory before proceeding.

Prerequisites and Setup ⚙️

XML Database Acquisition: Download the dblp.xml from DBLP's official repository.
Decompression: Unzip and house the dblp.xml in the input folder of this toolkit.

Usage 📘

The main_dblp_script.py is the maestro of the toolkit and can be invoked with the following command:

python main_dblp_script.py "Author Name" [--collaborators]

Arguments

author_name (required): The distinguished name of the author for whom data is to be processed.
--collaborators (optional): A switch to incorporate the entire dataset related to authors that have collaborated at least one time with the author specified in the "Author name" argument.

Example

To engage the toolkit for author "John Doe" with collaborators:

python main_dblp_script.py "John Doe" --collaborators

Scripts Breakdown

`entities_remover.py`

It purges the DBLP XML of entities for parsing compatibility.

Usage:

python entities_remover.py <input_xml_path> <output_clean_xml_path>

`xml_author_extractor.py`

This script mines the DBLP XML for records tagged to a given author.

Usage:

python xml_author_extractor.py <author_name> [--collaborators]

`xml_to_json_converter.py`

Convert the XML data for the author into a JSON format.

Usage:

python xml_to_json_converter.py <author_name> [--collaborators]

Output 📦

Executing the scripts will yield:

An XML file scrubbed clean of entities.
An author-centric XML file detailing publications.
A JSON with the author's bibliography.

Contributing 🤝

Enthusiastic about contributing?

Fork the repository.
Carve out your feature branch (git checkout -b feature/YourFeature).
Commit your enhancements (git commit -am 'Add some feature').
Push to the branch (git push origin feature/YourFeature).
Craft a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
biblio_fractal/model/tools		biblio_fractal/model/tools
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BiblioFractal 📚🌀

Tools for parsing the DBLP database 🛠️📑

Contents 📂

Getting Started 🚀

Prerequisites

Setup

Prerequisites and Setup ⚙️

Usage 📘

Arguments

Example

Scripts Breakdown

`entities_remover.py`

`xml_author_extractor.py`

`xml_to_json_converter.py`

Output 📦

Contributing 🤝

About

Releases

Packages

Languages

lorenzovarese/biblio-fractal

Folders and files

Latest commit

History

Repository files navigation

BiblioFractal 📚🌀

Tools for parsing the DBLP database 🛠️📑

Contents 📂

Getting Started 🚀

Prerequisites

Setup

Prerequisites and Setup ⚙️

Usage 📘

Arguments

Example

Scripts Breakdown

entities_remover.py

xml_author_extractor.py

xml_to_json_converter.py

Output 📦

Contributing 🤝

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`entities_remover.py`

`xml_author_extractor.py`

`xml_to_json_converter.py`

Packages