dt-political-debates

Digital Twin on Political Debates

graph TB
    ExternalUNDatabase[External UN Database] --> UNScrapper[ODTP UN Scrapper]

    subgraph ODTP
    direction TB
        UNScrapper --> ODTPPyannoteWhisper[Diarizarion & Transcription]
    end
    ODTPPyannoteWhisper --> UNS3[Political Debates S3]

    subgraph Webplatform
    direction TB
        UNS3 --> UNMongoDB

        UNMongoDB <--> Backend["Backend (API and Tools)"]
        UNMongoDB --> UNSolr[UN Solr]
        UNSolr --> Backend["Backend (API and Tools)"]
        Backend <--> Frontend[Political Debates GUI]
    end

List of components

odtp-unog-digitalrecordings-scrapper. Component to scrap and download metadata from the UNOG Digital Recordings platform.
- https://github.com/sdsc-ordes/odtp-unog-digitalrecordings-scrapper
odtp-pyannote-whisper. Component to diarize and transcribe audios and videos
- https://github.com/sdsc-ordes/odtp-pyannote-whisper

The following part of the projects are not odtp components.

debates-dataloader. Tool to upload files to the S3 bucket, MongoDB, and Solr.
- https://github.com/sdsc-ordes/debates-dataloader
debates-solr. Solr setup for debates ui.
- https://github.com/sdsc-ordes/debates-solr
debates-ui. Frontend for the Political Debates project.
- https://github.com/sdsc-ordes/debates-ui
political-debates-ui. Full deployment of all GUI components.
- https://github.com/sdsc-ordes/political-debates-ui

How to run this pipeline?

Tutorial to run the pipeline in ODTP

This pipeline can be executed in ODTP by executing dt-political-debates.sh

Clone this repository
Edit dt-political-debates.sh with the ODTP user email
Edit the secrets and other parameters.
Run the bash script: sh dt-political-debates.sh

Alternatively, you can add the components to the GUI and manually run the execution.

How to use this tool with a custom file?

In order to run the out-py annotate-whisper module, it is necessary to start with a valid metadata JSON file. You can obtain one by fetching the data from the scrapper or create a synthetic one manually. This is the example of a synthetic one you can use to generate yours. It should validate against schemas/unogDigitalRecordingMetadataMinimalSchema.json.

{
    "$schema": "https://raw.githubusercontent.com/sdsc-ordes/dt-political-debates/refs/heads/main/schemas/unogDigitalRecordingMetadataMinimalSchema.json",
    "version": "1.0",
    "metadata": {
      "title": "HRC_20220929T0000",
      "date": "2022-09-29",
      "time": "00:00",
      "url": "http://example.com",
      "tags": ["HRC_20220929T0000"],
      "summary": "",
      "labels": {}
    },
    "channels": [
      {
        "id": "video",
        "type": "video",
        "name": "Main Video Channel",
        "data": "HRC_20220929T0000.mp4",
        "tags": ["main", "video"]
      },
      {
        "id": "original",
        "type": "audio",
        "name": "Original Audio Channel",
        "data": "HRC_20220929T0000-original.wav",
        "tags": ["original", "audio"]
      }
    ],
    "annotations": [
    ]
  }

This metadata should be placed in the odtp-input folder for the odtp-pyannote-whisper component. In case you are using the outcome of the scrapper you shouldn't manually create this initial metadata file.

Changelog

v1.0.0
- Basic project structure, schemas, and scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dt-political-debates		dt-political-debates
schemas		schemas
scripts		scripts
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
dt-political-debates.sh		dt-political-debates.sh
odtp.yml		odtp.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dt-political-debates

List of components

How to run this pipeline?

Tutorial to run the pipeline in ODTP

How to use this tool with a custom file?

Changelog

Next steps

About

Releases

Packages

Languages

License

sdsc-ordes/dt-political-debates

Folders and files

Latest commit

History

Repository files navigation

dt-political-debates

List of components

How to run this pipeline?

Tutorial to run the pipeline in ODTP

How to use this tool with a custom file?

Changelog

Next steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages