Transcript Buddy

Transcript Buddy is a tool for processing and cleaning transcripts from Teams meetings. It uses language models to clean and format the transcripts into a coherent and readable format.

It was developed to process onboarding lessons transcripts, but the prompts can be adapted for other use cases.

Features

Processes Microsoft Teams meeting transcripts (.docx files)
Cleans and formats transcripts using AI language models (OpenAI/Groq)
Handles multiple files in batch processing
Maintains original content while improving readability
Generates clean, structured output files
Token limit validation to ensure model compatibility
Support for both OpenAI and Groq language models

Setup

Clone the repository:

git clone <repository_url>
cd transcript_buddy

Create and activate a virtual environment (recommended):

python -m venv .venv
.venv\Scripts\activate  # On Windows

Install the required dependencies:
```
pip install -r requirements.txt
```
Create a .env file in the root directory and add your API keys (or ask your colleague Alessio Piraccini 😉):
```
OPENAI_API_KEY=your_openai_api_key
GROQ_API_KEY=your_groq_api_key
```

Configuration

The application can be configured through the config.py file:

USE_OPENAI: Toggle between OpenAI (True) and Groq (False) models
MAX_TOKENS: Maximum token limit for text processing (default: 128k)
OPENAI_MODEL: OpenAI model to use (default: "gpt-4o-2024-11-20")
GROQ_MODEL: Groq model to use (default: "llama-3.1-70b-versatile")

Usage

Create the necessary directories:
```
mkdir -p data/input
```
Place your .docx files in the data/input directory.
Run the main script to process the files:
```
python src/main.py
```
The processed transcripts will be saved in the data/output directory.

Output Format

The processed transcripts are saved as HTML files with:

A header indicating the source (Amplifon CoE)
The cleaned and formatted transcript content
A disclaimer about AI-generated content
Markdown formatting for better readability

Project Structure

transcript_buddy/
├── data/
│   ├── input/     # Place input .docx files here
│   └── output/    # Processed transcripts are saved here
├── src/
│   ├── utils/
│   │   ├── file_processing.py  # File handling functions
│   │   └── llm.py             # Language model interactions
│   ├── config.py              # Configuration settings
│   └── main.py               # Main processing script
├── .env                      # Environment variables (API keys)
├── .gitignore
├── README.md
└── requirements.txt

Dependencies

python-docx: For reading and writing Word documents
tiktoken: Token counting for OpenAI models
markdown: Markdown processing
python-dotenv: Environment variable management
openai: OpenAI API client
groq: Groq API client

Error Handling

The application includes comprehensive error handling:

Logs all operations with timestamps
Validates API keys before processing
Checks token limits to prevent model errors
Provides clear error messages for common issues
Continues processing remaining files if one file fails
Creates detailed processing summaries

Troubleshooting

Common issues and solutions:

Missing API Keys
- Ensure your .env file exists and contains the required API keys
- Check that the API keys are valid and not expired
Token Limit Exceeded
- The file content exceeds the maximum token limit (128k)
- Consider splitting the file into smaller parts
File Format Issues
- Ensure input files are in .docx format
- Check that files are not corrupted or password-protected
Processing Errors
- Check the logs for specific error messages
- Verify that the input file is a valid Teams transcript
- Ensure you have an active internet connection

For any other issues, please check the logs or contact the maintainers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcript Buddy

Features

Setup

Configuration

Usage

Output Format

Project Structure

Dependencies

Error Handling

Troubleshooting

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

apiraccini-alk/transcript_buddy

Folders and files

Latest commit

History

Repository files navigation

Transcript Buddy

Features

Setup

Configuration

Usage

Output Format

Project Structure

Dependencies

Error Handling

Troubleshooting

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages