Skip to content

Commit

Permalink
Merge pull request #25 from tb1337/chore/update_project_files
Browse files Browse the repository at this point in the history
Add shiny new documentation
  • Loading branch information
tb1337 authored Dec 19, 2023
2 parents 8aaacac + 0ad2374 commit 0f3b1d5
Show file tree
Hide file tree
Showing 4 changed files with 293 additions and 95 deletions.
110 changes: 15 additions & 95 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,110 +1,30 @@
<p align="right">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://docs.paperless-ngx.com/assets/logo_full_white.svg#only-dark">
<source media="(prefers-color-scheme: light)" srcset="https://docs.paperless-ngx.com/assets/logo_full_black.svg#only-light">
<img width="200" alt="Shows an illustrated sun in light mode and a moon with stars in dark mode." src="https://docs.paperless-ngx.com/assets/logo_full_black.svg#only-light">
</picture>
</p>

<!-- omit in toc -->

# PyPaperless

![Test Badge](https://github.com/tb1337/paperless-api/actions/workflows/test.yml/badge.svg)

Little api client for [Paperless-ngx](https://github.com/paperless-ngx/paperless-ngx)! Find out more here:
Little api client for [Paperless-ngx](https://github.com/paperless-ngx/paperless-ngx)!

Find out more here:

* Project: https://docs.paperless-ngx.com
* REST API: https://docs.paperless-ngx.com/api/

## Features

- Depends on aiohttp.
- Token authentication, _note that credentials aren't supported anymore_.
- list requests all object ids of resources.
- get methods for each resources. Accepts page parameters and Django filters and is thus very powerful.
- iterate for each paginated endpoint, you may want to apply some Django filters here, as well.
- create, update, delete methods for documents and their master data endpoints.
- Paperless makes use of pagination. We use that too. You have the full control over how much data to fetch.
- pypaperless is meant to be only the transportation layer. Store and reduce/aggregate data on your own.

## Examples

### Handling a session.

```python
import asyncio

from pypaperless import Paperless

paperless = Paperless("localhost:8000", "your-secret-token")

async def main():
paperless.initialize()
# do something
paperless.close()

# or just use it in a context
async with paperless:
# do something

asyncio.run(main())
```

### Actually request something

```python
# requests one page
documents = await paperless.documents.get(page=1)
for item in documents.items:
print(f"document #{item.id} has the following content: {item.content}")
```

### Request all items of specific document types and iterate over them
```python
doc_types = [
"3", # salary
"8", # contract
"11", # bank account
]

# iterates over all pages
async for item in paperless.documents.iterate(document_type__id__in=",".join(doc_types)):
print(f"document #{item.id} has the following content: {item.content}")
```

### Request a specific item
```python
correspondent = await paperless.correspondents.one(23)
```

### Create a new correspondent
```python
from pypaperless.models import CorrespondentPost
from pypaperless.models.shared import MatchingAlgorithm

new_correspondent = CorrespondentPost(
name="Salty Correspondent",
match="Give me all your money",
matching_algorithm=MatchingAlgorithm.ALL,
)
# watch out, the result is a Correspondent object...
created_correspondent = paperless.correspondents.create(new_correspondent)
print(created_correspondent.id)
# >> 1337
```
- Depends on aiohttp, works in async environments.
- Token authentication only. **No credentials anymore.**
- `list()` requests all object ids of resources.
- `get()` for each resources. Accepts Django filters.
- `iterate()` for each endpoint. Accepts Django filters.
- `create()`, `update()`, `delete()` methods for many resources.
- Paperless makes use of pagination. We use that too. You have full control.
- *PyPaperless* only transports data. Your code organizes it.

### And delete that salty guy again, including all of his god damn documents!
## Documentation

> [!CAUTION]
> That code actually requests Paperless to physically delete that data! There is no point of return!
> ```python
> # ...
> async for item in paperless.documents.iterate(correspondent__id=1337):
> await paperless.documents.delete(item)
>
> await paperless.correspondents.delete(created_correspondent)
> ```
* [Handling a session](docs/SESSION.md)
* [Request data](docs/REQUEST.md)
* [Create, update, delete data](docs/CRUD.md)

## Thanks to

Expand Down
122 changes: 122 additions & 0 deletions docs/CRUD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Create, Update, Delete

If you plan to manipulate your Paperless data, continue reading. Manipulation is the process of inserting, updating and deleting data from and to your Paperless database.

In the following examples, we assume you already have initialized the `Paperless` object in your code.

- [Supported resources](#supported-resources)
- [Update items](#update-items)
- [Create items](#create-items)
- [Delete items](#delete-items)

## Supported resources

*PyPaperless* enables create/update/delete wherever it makes sense:

* correspondents
* custom_fields
* document_types
* documents
* custom_fields
* *metadata* is not supported
* *notes* are currently not supported ([#23](https://github.com/tb1337/paperless-api/issues/23))
* share_links
* storage_paths
* tags

## Update items

The Paperless api enables us to change almost everything via REST. Personally, I use that to validate document titles, as I have declared naming conventions to document types. I try to apply the correct title to the document, and if that fails for some reason, it gets a _TODO_ tag applied. So I can edit manually later on.

> [!TIP]
> You may have other use-cases. Feel free to share them with me by opening an [issue](https://github.com/tb1337/paperless-api/issues).
Updating is as easy as requesting items. Gather any resource item object, update its attributes and call the `update()` method of the endpoint.

**Example 1**

```python
document = await paperless.documents.one(42)
document.title = "42 - The Answer"
document.content = """
The Answer to the Ultimate Question of Life,
the Universe, and Everything.
"""
document = await paperless.documents.update(document)
#>>> Document(id=42, title="42 - The Answer", content="...", ...)
```

**Example 2**

```python
filters = {
"title__istartswith": "invoice",
}
async for item in paperless.documents.iterate(**filters):
item.title = item.title.replace("invoice", "bill")
await paperless.documents.update(item)
```

Every `update()` call will send a `PUT` http request to Paperless, containing the full serialized item. That behaviour will be refactored in the future. Only changed attributes will be sent via `PATCH` http requests. ([#24](https://github.com/tb1337/paperless-api/issues/24))

## Create items

It absolutely makes sense to create new data in the Paperless database, especially documents. Therefore, item creation is implemented for many resources. It differs slightly from `update()` and `delete()`. *PyPaperless* doesn't validate data, its meant to be the transportation layer between your code and Paperless only. To reduce common mistakes, it provides special classes for creating new items. Use them.

For every creatable resource exists a *Resource*Post class. Instantiate that class with some data and call the `create()` method of your endpoint. There you go.

**Example for documents**

```python
from pypaperless.models import DocumentPost

# or read the contents of a file, whatver you want
content = b"..."

# there are more attributes available, check type hints
new_document = DocumentPost(document=content)
task_id = await paperless.documents.create(new_document)
#>>> abcdefabcd-efab-cdef-abcd-efabcdefabcd
```

> [!TIP]
> You can access the current OCR status of your new document when requesting the `tasks` endpoint with that id.
**Example for other resources**

```python
from pypaperless.models import CorrespondentPost
from pypaperless.models.shared import MatchingAlgorithm

new_correspondent = CorrespondentPost(
name="Salty correspondent",
match="Give me all your money",
matching_algorithm=MatchingAlgorithm.ALL,
)
# watch out, the result is a Correspondent object...
created_correspondent = paperless.correspondents.create(new_correspondent)
print(created_correspondent.id)
# >> 1337
```

Every `create()` call will send a `POST` http request to Paperless, containing the full serialized item.

## Delete items

In some cases, you want to delete items. Its almost the same as updating, just call the `delete()` method. Lets delete that salty guy again, including all of his documents!

> [!CAUTION]
> This will permanently delete data from your Paperless database. There is no point of return.
```python
# ...
filters = {
"correspondent__id": new_correspondent.id
}
async for item in paperless.documents.iterate(**filters):
await paperless.documents.delete(item)

await paperless.correspondents.delete(created_correspondent)
```

Every `delete()` call will send a `DELETE` http request to Paperless without any payload.
117 changes: 117 additions & 0 deletions docs/REQUEST.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Requesting data

It's all about accessing data, that's obviously the reason you downloaded *PyPaperless*.

In the following examples, we assume you already have initialized the `Paperless` object in your code.

- [Basic requests](#basic-requests)
- [Requesting a list of pk's](#requesting-a-list-of-pks)
- [Requesting an item](#requesting-an-item)
- [Requesting paginated items of a resource](#requesting-paginated-items-of-a-resource)
- [Iterating over all resource items](#iterating-over-all-resource-items)
- [Filtered requests](#filtered-requests)
- [Requesting filtered pages of a resource](#requesting-filtered-pages-of-a-resource)
- [Iterating over filtered resource items](#iterating-over-filtered-resource-items)
- [Further information](#further-information)

## Basic requests

The following examples should solve the most use-cases.

### Requesting a list of pk's

Paperless returns a JSON key *all* on every paginated request, which represents a list of all pk's matching our filtered request. The `list()` method requests page one unfiltered, resulting in getting a complete list.

```python
correspondent_pks = await paperless.correspondents.list()
#>>> [1, 2, 3, ...]
```

It's the same for each resource. Let's try with documents:

```python
document_pks = await paperless.documents.list()
#>>> [5, 23, 42, 1337, ...]
```

### Requesting an item

You may want to actually access the data of Paperless resources. Lets do it!

```python
# request document with pk 23
document = await paperless.documents.one(23)
#>>> Document(id=23, ...)
```

### Requesting paginated items of a resource

Accessing single resources by pk would result in too many requests, so you can access the paginated results, too.

```python
# request page 1
documents = await paperless.documents.get()
#>>> PaginatedResult(current_page=1, next_page=2, items=[Document(...), ...])

# request page 2
documents = await paperless.documents.get(page=2)
#>>> PaginatedResult(current_page=2, next_page=3, items=[Document(...), ...])
```

If you are requesting the last page, the `next_page` property would be `None`.

### Iterating over all resource items

Sometimes, dealing with pages makes no sense for you, so you may want to iterate over all items at once.

> ![NOTE]
> Iterating over all documents could take some time, depending on how many items you have stored in your database.
```python
async for item in paperless.documents.iterate():
print(item.title)
#>>> 'New Kitchen Invoice'
```

## Filtered requests

Sometimes, you want to filter results in order to access them faster, or to apply context to your requests, or both. In case of documents, iterating over them can be very time-consuming. The Paperless api provides filter query attributes for each resource. There are **many** filters, so I cannot list them all. The easiest way to find them out is accessing the Api Root of your local Paperless installation, by adding `/api/` to the url.

For example: `http://localhost:8000/api/`

Once the list of all api endpoints is available, choose your resource by clicking the link next to the name. If a **Filter** button is displayed on the top of the next page, filtering is supported by the resource. Click the button to access all available filters, apply a dummy filter and click on the **Apply** button.
The website now displays something like that under the heading resource name:
`GET /api/documents/?id__in=&id=&title__istartswith=&title__iendswith=&title__icontains=...`

The names of the query parameters are available as keywords in the `get()`and `ìterate()` methods.

### Requesting filtered pages of a resource

Filters are passed as keywords to the `get()` method.

```python
filters = {
"title__istartswith": "invoice",
"content__icontains": "cheeseburger",
}
filtered_documents = await paperless.documents.get(**filters)
#>>> PaginatedResult(current_page=1, next_page=None, items=[Document(...), ...])
```

### Iterating over filtered resource items

Iterating is also possible with filters and works the same way as requesting filtered pages.

```python
# we assume you have declared the same filter dict as above
async for item in paperless.documents.iterate(**filters):
print(item.title)
#>>> 'Invoice for yummy cheeseburgers'
```

> ![NOTE]
> Paperless simply ignores filters which don't exist. You could end up iterating over all of your documents, which will take time in the worst case. Use filters carefully and check twice.
## Further information

Each `list()` and `get()` call results in a single `GET` http request. When using `iterate()`, 1-n `GET` http requests will be sent until all pages are requested.
Loading

0 comments on commit 0f3b1d5

Please sign in to comment.