diff --git a/README.md b/README.md index 1d609ca..5931c9e 100644 --- a/README.md +++ b/README.md @@ -1,110 +1,30 @@ -
- -
- - - # PyPaperless ![Test Badge](https://github.com/tb1337/paperless-api/actions/workflows/test.yml/badge.svg) -Little api client for [Paperless-ngx](https://github.com/paperless-ngx/paperless-ngx)! Find out more here: +Little api client for [Paperless-ngx](https://github.com/paperless-ngx/paperless-ngx)! + +Find out more here: * Project: https://docs.paperless-ngx.com * REST API: https://docs.paperless-ngx.com/api/ ## Features -- Depends on aiohttp. -- Token authentication, _note that credentials aren't supported anymore_. -- list requests all object ids of resources. -- get methods for each resources. Accepts page parameters and Django filters and is thus very powerful. -- iterate for each paginated endpoint, you may want to apply some Django filters here, as well. -- create, update, delete methods for documents and their master data endpoints. -- Paperless makes use of pagination. We use that too. You have the full control over how much data to fetch. -- pypaperless is meant to be only the transportation layer. Store and reduce/aggregate data on your own. - -## Examples - -### Handling a session. - -```python -import asyncio - -from pypaperless import Paperless - -paperless = Paperless("localhost:8000", "your-secret-token") - -async def main(): - paperless.initialize() - # do something - paperless.close() - - # or just use it in a context - async with paperless: - # do something - -asyncio.run(main()) -``` - -### Actually request something - -```python -# requests one page -documents = await paperless.documents.get(page=1) -for item in documents.items: - print(f"document #{item.id} has the following content: {item.content}") -``` - -### Request all items of specific document types and iterate over them -```python -doc_types = [ - "3", # salary - "8", # contract - "11", # bank account -] - -# iterates over all pages -async for item in paperless.documents.iterate(document_type__id__in=",".join(doc_types)): - print(f"document #{item.id} has the following content: {item.content}") -``` - -### Request a specific item -```python -correspondent = await paperless.correspondents.one(23) -``` - -### Create a new correspondent -```python -from pypaperless.models import CorrespondentPost -from pypaperless.models.shared import MatchingAlgorithm - -new_correspondent = CorrespondentPost( - name="Salty Correspondent", - match="Give me all your money", - matching_algorithm=MatchingAlgorithm.ALL, -) -# watch out, the result is a Correspondent object... -created_correspondent = paperless.correspondents.create(new_correspondent) -print(created_correspondent.id) -# >> 1337 -``` +- Depends on aiohttp, works in async environments. +- Token authentication only. **No credentials anymore.** +- `list()` requests all object ids of resources. +- `get()` for each resources. Accepts Django filters. +- `iterate()` for each endpoint. Accepts Django filters. +- `create()`, `update()`, `delete()` methods for many resources. +- Paperless makes use of pagination. We use that too. You have full control. +- *PyPaperless* only transports data. Your code organizes it. -### And delete that salty guy again, including all of his god damn documents! +## Documentation -> [!CAUTION] -> That code actually requests Paperless to physically delete that data! There is no point of return! -> ```python -> # ... -> async for item in paperless.documents.iterate(correspondent__id=1337): -> await paperless.documents.delete(item) -> -> await paperless.correspondents.delete(created_correspondent) -> ``` +* [Handling a session](docs/SESSION.md) +* [Request data](docs/REQUEST.md) +* [Create, update, delete data](docs/CRUD.md) ## Thanks to diff --git a/docs/CRUD.md b/docs/CRUD.md new file mode 100644 index 0000000..baab4ab --- /dev/null +++ b/docs/CRUD.md @@ -0,0 +1,122 @@ +# Create, Update, Delete + +If you plan to manipulate your Paperless data, continue reading. Manipulation is the process of inserting, updating and deleting data from and to your Paperless database. + +In the following examples, we assume you already have initialized the `Paperless` object in your code. + +- [Supported resources](#supported-resources) +- [Update items](#update-items) +- [Create items](#create-items) +- [Delete items](#delete-items) + +## Supported resources + +*PyPaperless* enables create/update/delete wherever it makes sense: + +* correspondents +* custom_fields +* document_types +* documents + * custom_fields + * *metadata* is not supported + * *notes* are currently not supported ([#23](https://github.com/tb1337/paperless-api/issues/23)) +* share_links +* storage_paths +* tags + +## Update items + +The Paperless api enables us to change almost everything via REST. Personally, I use that to validate document titles, as I have declared naming conventions to document types. I try to apply the correct title to the document, and if that fails for some reason, it gets a _TODO_ tag applied. So I can edit manually later on. + +> [!TIP] +> You may have other use-cases. Feel free to share them with me by opening an [issue](https://github.com/tb1337/paperless-api/issues). + +Updating is as easy as requesting items. Gather any resource item object, update its attributes and call the `update()` method of the endpoint. + +**Example 1** + +```python +document = await paperless.documents.one(42) +document.title = "42 - The Answer" +document.content = """ +The Answer to the Ultimate Question of Life, +the Universe, and Everything. +""" +document = await paperless.documents.update(document) +#>>> Document(id=42, title="42 - The Answer", content="...", ...) +``` + +**Example 2** + +```python +filters = { + "title__istartswith": "invoice", +} +async for item in paperless.documents.iterate(**filters): + item.title = item.title.replace("invoice", "bill") + await paperless.documents.update(item) +``` + +Every `update()` call will send a `PUT` http request to Paperless, containing the full serialized item. That behaviour will be refactored in the future. Only changed attributes will be sent via `PATCH` http requests. ([#24](https://github.com/tb1337/paperless-api/issues/24)) + +## Create items + +It absolutely makes sense to create new data in the Paperless database, especially documents. Therefore, item creation is implemented for many resources. It differs slightly from `update()` and `delete()`. *PyPaperless* doesn't validate data, its meant to be the transportation layer between your code and Paperless only. To reduce common mistakes, it provides special classes for creating new items. Use them. + +For every creatable resource exists a *Resource*Post class. Instantiate that class with some data and call the `create()` method of your endpoint. There you go. + +**Example for documents** + +```python +from pypaperless.models import DocumentPost + +# or read the contents of a file, whatver you want +content = b"..." + +# there are more attributes available, check type hints +new_document = DocumentPost(document=content) +task_id = await paperless.documents.create(new_document) +#>>> abcdefabcd-efab-cdef-abcd-efabcdefabcd +``` + +> [!TIP] +> You can access the current OCR status of your new document when requesting the `tasks` endpoint with that id. + +**Example for other resources** + +```python +from pypaperless.models import CorrespondentPost +from pypaperless.models.shared import MatchingAlgorithm + +new_correspondent = CorrespondentPost( + name="Salty correspondent", + match="Give me all your money", + matching_algorithm=MatchingAlgorithm.ALL, +) +# watch out, the result is a Correspondent object... +created_correspondent = paperless.correspondents.create(new_correspondent) +print(created_correspondent.id) +# >> 1337 +``` + +Every `create()` call will send a `POST` http request to Paperless, containing the full serialized item. + +## Delete items + +In some cases, you want to delete items. Its almost the same as updating, just call the `delete()` method. Lets delete that salty guy again, including all of his documents! + +> [!CAUTION] +> This will permanently delete data from your Paperless database. There is no point of return. + +```python +# ... +filters = { + "correspondent__id": new_correspondent.id +} +async for item in paperless.documents.iterate(**filters): + await paperless.documents.delete(item) + +await paperless.correspondents.delete(created_correspondent) +``` + +Every `delete()` call will send a `DELETE` http request to Paperless without any payload. diff --git a/docs/REQUEST.md b/docs/REQUEST.md new file mode 100644 index 0000000..6c55a42 --- /dev/null +++ b/docs/REQUEST.md @@ -0,0 +1,117 @@ +# Requesting data + +It's all about accessing data, that's obviously the reason you downloaded *PyPaperless*. + +In the following examples, we assume you already have initialized the `Paperless` object in your code. + +- [Basic requests](#basic-requests) + - [Requesting a list of pk's](#requesting-a-list-of-pks) + - [Requesting an item](#requesting-an-item) + - [Requesting paginated items of a resource](#requesting-paginated-items-of-a-resource) + - [Iterating over all resource items](#iterating-over-all-resource-items) +- [Filtered requests](#filtered-requests) + - [Requesting filtered pages of a resource](#requesting-filtered-pages-of-a-resource) + - [Iterating over filtered resource items](#iterating-over-filtered-resource-items) +- [Further information](#further-information) + +## Basic requests + +The following examples should solve the most use-cases. + +### Requesting a list of pk's + +Paperless returns a JSON key *all* on every paginated request, which represents a list of all pk's matching our filtered request. The `list()` method requests page one unfiltered, resulting in getting a complete list. + +```python +correspondent_pks = await paperless.correspondents.list() +#>>> [1, 2, 3, ...] +``` + +It's the same for each resource. Let's try with documents: + +```python +document_pks = await paperless.documents.list() +#>>> [5, 23, 42, 1337, ...] +``` + +### Requesting an item + +You may want to actually access the data of Paperless resources. Lets do it! + +```python +# request document with pk 23 +document = await paperless.documents.one(23) +#>>> Document(id=23, ...) +``` + +### Requesting paginated items of a resource + +Accessing single resources by pk would result in too many requests, so you can access the paginated results, too. + +```python +# request page 1 +documents = await paperless.documents.get() +#>>> PaginatedResult(current_page=1, next_page=2, items=[Document(...), ...]) + +# request page 2 +documents = await paperless.documents.get(page=2) +#>>> PaginatedResult(current_page=2, next_page=3, items=[Document(...), ...]) +``` + +If you are requesting the last page, the `next_page` property would be `None`. + +### Iterating over all resource items + +Sometimes, dealing with pages makes no sense for you, so you may want to iterate over all items at once. + +> ![NOTE] +> Iterating over all documents could take some time, depending on how many items you have stored in your database. + +```python +async for item in paperless.documents.iterate(): + print(item.title) + #>>> 'New Kitchen Invoice' +``` + +## Filtered requests + +Sometimes, you want to filter results in order to access them faster, or to apply context to your requests, or both. In case of documents, iterating over them can be very time-consuming. The Paperless api provides filter query attributes for each resource. There are **many** filters, so I cannot list them all. The easiest way to find them out is accessing the Api Root of your local Paperless installation, by adding `/api/` to the url. + +For example: `http://localhost:8000/api/` + +Once the list of all api endpoints is available, choose your resource by clicking the link next to the name. If a **Filter** button is displayed on the top of the next page, filtering is supported by the resource. Click the button to access all available filters, apply a dummy filter and click on the **Apply** button. +The website now displays something like that under the heading resource name: +`GET /api/documents/?id__in=&id=&title__istartswith=&title__iendswith=&title__icontains=...` + +The names of the query parameters are available as keywords in the `get()`and `ìterate()` methods. + +### Requesting filtered pages of a resource + +Filters are passed as keywords to the `get()` method. + +```python +filters = { + "title__istartswith": "invoice", + "content__icontains": "cheeseburger", +} +filtered_documents = await paperless.documents.get(**filters) +#>>> PaginatedResult(current_page=1, next_page=None, items=[Document(...), ...]) +``` + +### Iterating over filtered resource items + +Iterating is also possible with filters and works the same way as requesting filtered pages. + +```python +# we assume you have declared the same filter dict as above +async for item in paperless.documents.iterate(**filters): + print(item.title) + #>>> 'Invoice for yummy cheeseburgers' +``` + +> ![NOTE] +> Paperless simply ignores filters which don't exist. You could end up iterating over all of your documents, which will take time in the worst case. Use filters carefully and check twice. + +## Further information + +Each `list()` and `get()` call results in a single `GET` http request. When using `iterate()`, 1-n `GET` http requests will be sent until all pages are requested. diff --git a/docs/SESSION.md b/docs/SESSION.md new file mode 100644 index 0000000..82a9cd0 --- /dev/null +++ b/docs/SESSION.md @@ -0,0 +1,39 @@ +# Handling a session + +Just import the module and go on. + +```python +import asyncio + +from pypaperless import Paperless + +paperless = Paperless("localhost:8000", "your-secret-token") + +# your main function here + +asyncio.run(main()) +``` + +Now you can utilize the Paperless object. + +**Example 1** + +```python +async def main(): + paperless.initialize() + # do something + paperless.close() +``` + +**Example 2** + +```python +async def main(): + async with paperless: + # do something +``` + +You may want to request or manipulate data. Read more about that here: + +* [Request data](REQUEST.md) +* [Create, update, delete data](CRUD.md)