Skip to content

sdsc-ordes/dt-political-debates

Repository files navigation

dt-political-debates

Digital Twin on Political Debates

graph TB
    ExternalUNDatabase[External UN Database] --> UNScrapper[ODTP UN Scrapper]

    subgraph ODTP
    direction TB
        UNScrapper --> ODTPPyannoteWhisper[Diarizarion & Transcription]
    end
    ODTPPyannoteWhisper --> UNS3[Political Debates S3]

    subgraph Webplatform
    direction TB
        UNS3 --> UNMongoDB

        UNMongoDB <--> Backend["Backend (API and Tools)"]
        UNMongoDB --> UNSolr[UN Solr]
        UNSolr --> Backend["Backend (API and Tools)"]
        Backend <--> Frontend[Political Debates GUI]
    end
Loading

List of components

The following part of the projects are not odtp components.

How to run this pipeline?

Tutorial to run the pipeline in ODTP

This pipeline can be executed in ODTP by executing dt-political-debates.sh

  1. Clone this repository
  2. Edit dt-political-debates.sh with the ODTP user email
  3. Edit the secrets and other parameters.
  4. Run the bash script: sh dt-political-debates.sh

Alternatively, you can add the components to the GUI and manually run the execution.

How to use this tool with a custom file?

In order to run the out-py annotate-whisper module, it is necessary to start with a valid metadata JSON file. You can obtain one by fetching the data from the scrapper or create a synthetic one manually. This is the example of a synthetic one you can use to generate yours. It should validate against schemas/unogDigitalRecordingMetadataMinimalSchema.json.

{
    "$schema": "https://raw.githubusercontent.com/sdsc-ordes/dt-political-debates/refs/heads/main/schemas/unogDigitalRecordingMetadataMinimalSchema.json",
    "version": "1.0",
    "metadata": {
      "title": "HRC_20220929T0000",
      "date": "2022-09-29",
      "time": "00:00",
      "url": "http://example.com",
      "tags": ["HRC_20220929T0000"],
      "summary": "",
      "labels": {}
    },
    "channels": [
      {
        "id": "video",
        "type": "video",
        "name": "Main Video Channel",
        "data": "HRC_20220929T0000.mp4",
        "tags": ["main", "video"]
      },
      {
        "id": "original",
        "type": "audio",
        "name": "Original Audio Channel",
        "data": "HRC_20220929T0000-original.wav",
        "tags": ["original", "audio"]
      }
    ],
    "annotations": [
    ]
  }

This metadata should be placed in the odtp-input folder for the odtp-pyannote-whisper component. In case you are using the outcome of the scrapper you shouldn't manually create this initial metadata file.

Changelog

  • v1.0.0
    • Basic project structure, schemas, and scripts.

Next steps

  • odtp-trascription2pdf component
  • data-downloader component
  • datauploader component
  • faces indentifier component
  • docker-compose
  • odtp compatibility
  • documentation

About

Digital Twin on Political Debates

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published