A RESTful API to access the entire Project Gutenberg catalogue
An instance should be running here. Please host your own instance if you are planning on using the API extensively.
Just a single command and you are ready to go
docker compose up
The application is listening at localhost:80
On a production machine make sure to setup .env file, among other things. Refer to .env.template for more information.
Skip this section if you are using the database from the releases page.
First clone the repository.
git clone https://github.com/GnikDroy/gutenberg_api
You will need to generate the SQLite Database from the Project Gutenberg catalogue. The catalogue is updated daily and is not present in the repository.
Get a copy of the Project Gutenberg catalog here. We use the format where each book gets its own RDF file. The current implementation of RDF in python has poor performance so parsing the single file catalogue is not feasible.
Once you have the catalogue,
you can use scripts/rdf_parser.py
or scripts/books_db.py
to generate the JSON catalogue or an SQLite database.
Since RDFLib has poor performance, it is often worthwhile to convert the RDF files to JSON with rdf_parser.py
,
and use books_db.py
to finally generate the SQLite database.
If you do not need to do this regularly, books_db.py
will also directly create an SQLite database from the RDF files.
pip install -r scripts/requirements.txt
[python] rdf_parser.py -h
[python] books_db.py -h
You should be able to generate a SQLite database after reading the help messages. Additionally, you have the ability to generate JSON files if need be.
This will take some time! Get yourself a cup of coffee.
After generating a SQLite database, you will need to load this into Django.
There is a custom load_db
command inside api
app for this.
Before that, let's setup the django project.
From the project root
pip install -r requirements.txt
python3 manage.py migrate
Finally, let's load the generated/downloaded SQLite DB
python3 manage.py load_db [--clear] path/to/generated_sqlite.db
# For more info: python3 manage.py load_db -h
Setup your environment,
cp .env.template .env
You should populate .env
with your production settings (SECRET_KEYS, ALLOWED_HOSTS, etc).
If you are not in production, the default settings will set you up for development. You do not need to do anything.
Additionally, if you are in production you might have to setup some stuff in wsgi.py
or asgi.py
.
Please refer to the django docs for more details.
After setting everything up, you should be ready to roll!
python3 manage.py runserver <PORT>
I am unable to test installation steps for every single environment. If you have had to perform some additional steps to reach this stage, and would like to inform others, please create a PR.
This section is NOT for you if you want to quickly setup and deploy the app.
The generated SQLite database can be used differently.
You should be able to see the tables from some sort of a DB management software.
Please refer to scripts/books_db.py
for more details on the schema.
If you are aiming to make a lot of requests to the Resource
obtained from the API,
it might be worthwhile to mirror Project Gutenberg..
You would then need to update the Resource URIs generated from the RDF files to point to your domain.
Refer to books_db.py
and rdf_parser.py
. A small SQLite command will also do the job.
It is highly recommended to use the Browseable API to interact with the API instead.
The following endpoints are currently supported.
Endpoint | Details |
---|---|
/api/book | Details of a Book |
/api/bookshelf | Bookshelf information |
/api/agent_type | The type of agent. (Artist, Illustrator, etc) |
/api/person | Details of a person (name, birth date, webpage, etc) |
/api/resource | Details of a resource (url, size, type, etc) |
/api/agent | Agent is a unique (Person, AgentType) pair |
/api/language | Language of the book (en, fr, etc) |
/api/subject | The list of genres/subjects. |
The endpoints can be used to list and to view details of an item.
Pagination applies for viewing the list of items. For instance, to view the list of Books.
GET /api/book/
{
"count": 65777,
"next": "<next_page_url>",
"previous": null,
"results": [...]
}
# count = Total number of books
# next = URL for the next page
# previous = URL for the previous page
# results = Array of Book instances.
Provide the id to view the details of a model. This might be an integer or a string.
For instance, the following GET request would return the details of a book.
GET /api/book/21558/
{
"id": 21558,
"type": "Text",
"title": "The Children of the New Forest",
"description": null,
"downloads": 164,
"license": "http://www.gutenberg.org/license",
"subjects": [
"Foresters -- Juvenile fiction",
"New Forest (England : Forest) -- Juvenile fiction",
"Great Britain -- History -- Civil War, 1642-1649 -- Juvenile fiction",
"Soldiers -- Juvenile fiction",
"Orphans -- Juvenile fiction",
"PZ"
],
"bookshelves": [],
"languages": [ "en" ],
"agents": [
{
"id": 3570,
"person": "Marryat, Frederick",
"type": "Author"
}
],
"resources": [
{
"id": 182146,
"uri": "https://www.gutenberg.org/files/21558/21558.zip",
"type": "application/zip"
},
{
"id": 182145,
"uri": "https://www.gutenberg.org/ebooks/21558.kindle.noimages",
"type": "application/x-mobipocket-ebook"
},
]
}
Note that this applies to all endpoints. So /api/person/1
would show details of a particular person.
For search you can use the ?search=
query parameter to search for items.
For instance,
GET /api/book/?search=Jane
{
"count": 288,
"next": "...",
"previous": null,
"results": [...]
}
Reasonable filters are available at most endpoints. For instance, ?languages=
available at the Book
endpoint.
GET /api/book/?languages=en
{
"count": 53265,
"next": "/api/book/?languages=en&page=2",
"previous": null,
"results": [
{
"id": 1342,
"format": "Text",
"title": "Pride and Prejudice",
...
},
...
]
}
Here are some of the filters available:
Model | Filters |
---|---|
Book |
type, languages, title_contains, description_contains, downloads_range, has_bookshelf, has_resource_type, has_agent_type, agent_name_contains, agent_alias_contains, agent_webpage_contains, agent_birth_date_range, agent_death_date_range |
Person |
name_contains, alias_contains, webpage_contains, birth_date_range, death_date_range |
Agent |
type, name_contains, birth_date, death_date |
Resource |
size_gt, size_lt, size_range, modified_gt, modified_lt, modified_range, type |
The suffixes have intuitive meanings:
Suffix | Meaning |
---|---|
gt |
Greater than (Number, Date, String etc) |
lt |
Less than (Number, Date, String etc) |
contains |
Case insensitive search on the field |
range |
Creates max and min query pairs for range based filtering. For example size_range will create size_range_min and size_range_max |
NO SUFFIX |
Usually an exact match filter. Might have different meaning depending on the context. |
Please refer to the Browseable API for a comprehensive list of filters available at each endpoint.
Similar to filters, reasonable ordering mechanisms are available at most endpoints.
GET /api/book/?ordering=-downloads
{
"count": 65777,
"next": "/api/book/?ordering=-downloads&page=2",
"previous": null,
"results": [
{
"id": 1342,
"format": "Text",
"title": "Pride and Prejudice",
"description": "https://en.wikipedia.org/wiki/Pride_and_Prejudice",
"downloads": 43379,
...
},
...
]
}
Notice how ?ordering=-downloads
orders results in a descending order whereas
?ordering=downloads
in an ascending order. You can also order them by multiple fields.
For instance, ?ordering=-downloads,title
will work as well.
Please refer the Browseable API for a more comprehensive list of orderings available at each endpoint.
As expected, Search, Filters and Ordering queries can be combined.
Feel free to raise a new issue or send a pull request.
Any contribution is welcome. That being said, please raise an issue before you start working on a new feature.
We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
Examples of behavior that contributes to a positive environment for our community include:
- Demonstrating empathy and kindness toward other people
- Being respectful of differing opinions, viewpoints, and experiences
- Giving and gracefully accepting constructive feedback
- Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
- Focusing on what is best not just for us as individuals, but for the overall community
Examples of unacceptable behavior include:
- The use of sexualized language or imagery, and sexual attention or advances of any kind
- Trolling, insulting or derogatory comments, and personal or political attacks
- Public or private harassment
- Publishing others' private information, such as a physical or email address, without their explicit permission
- Other conduct which could reasonably be considered inappropriate in a professional setting
Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the reporter of any incident.
Community leaders will follow Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct.
This Code of Conduct is adapted from the Contributor Covenant, version 2.0, available at Code of Conduct.
Community Impact Guidelines were inspired by Mozilla's code of conduct enforcement ladder.
For answers to common questions about this code of conduct, see the FAQ at FAQ. Translations are available here.
The Gutenberg API is one of the largest source of free ebooks, and the catalogue provides a lot of metadata. The existing projects did not fully expose all the information provided in the metadata (this project aims to provide most of the relevant ones). Additionally, we also provide a mechanism to parse the RDF files into JSON or an SQLite database for easier analysis, as well as a comprehensive browsable API view.
You can find the license at Github. Please refer to Project Gutenberg's policy on the use of resources made available to you.