Skip to content

Batch upload and update documents with Transkribus' REST API.

License

Notifications You must be signed in to change notification settings

cconzen/TranskribusBatchUtils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transkribus Batch Utils

This project provides Python scripts to interact with the Transkribus REST API.

Features

  • Batch Upload: Upload multiple documents and their PageXMLs to a specified Transkribus collection.
  • Batch Update: Update PageXMLs/transcriptions of documents already in a Transkribus collection.

Prerequisites

  • Python 3.7+
  • An active Transkribus account

Installation

  1. Clone the repository:

    git clone https://github.com/cconzen/TranskribusBatchUtils.git
    cd TranskribusBatchUtils
    
  2. Install required dependencies:

    pip install -r requirements.txt
    
  3. Set up your Transkribus Log in credentials in an .env file:

    TRANSKRIBUS_USER=<[email protected]>
    TRANSKRIBUS_PASSWORD=<password>
    

Usage

Batch Upload Documents

To upload a directory of documents to a Transkribus collection, use the following command:

python main.py upload <base_directory> <collection_id>

<base_directory>: The base directory that contains the documents.

<collection_id>: The Transkribus collection ID to which you want to upload the documents.

Batch Update Documents

To update the PageXMLs of all documents in a Transkribus collection, use:

python main.py update <base_directory> <collection_id>

<base_directory>: The base directory that contains the documents.

<collection_id>: The Transkribus collection ID to which you want to upload the documents.

Example Command

python main.py upload documents 12345

This will upload the contents of the documents directory to the collection with ID 12345.

Example Directory Structure

Make sure your base directory follows this structure for the script to work properly:

(metadata.xml is only required for updating existing Transkribus documents; it should be automatically created when exporting from Transkribus)

```
/BASE_DIR/
    /Document_1/
        metadata.xml
        /page/
            Page_1.xml
            Page_2.xml
    /Document_2/
        metadata.xml
        /page/
            Page_1.xml
            Page_2.xml
    ...
```

License

This project is licensed under the MIT License.

About

Batch upload and update documents with Transkribus' REST API.

Topics

Resources

License

Stars

Watchers

Forks

Languages