This repository hosts a collection of notebooks designed to investigate the influence of length-defining keywords on the responses generated by the ChatGPT ("gpt-3.5-turbo") model.
By analyzing the variability and average length of these responses, the research aims to provide insights into how prompts can be tailored to generate outputs of desired lengths.
The repository consists of four main notebooks:
This notebook describes the collection of the control group data using utils/generate_responses.py
. Collection consists of 500 responses (5 sets of 100) from the ChatGPT model.
Prompts were generated using templates from prompt_templates.py
without specifying any length-defining keywords, creating a baseline of response length variability.
This data will serve as a reference point for comparing and measuring the effects of different keywords on response lengths.
Here, the data collected in the first notebook is analyzed.
It includes a deep dive into the
- normality of data distributions for each text type,
- visualization of character, word, and token distributions,
- comparison of distributions using the Kolmogorov-Smirnov test,
- and the determination of template-specific length-defining keywords.
In this notebook, an experimental group was gathered using the utils/generate_responses.py
function, with the prompt templates from utils/prompt_templates.py
and length-defining keywords from utils/length_defining_keywords.py
serving as variables.
It automates data collection by making API requests and fetching 100 responses for each template-keyword combination.
The responses were quantified in terms of word count, character count, and token count, and stored in CSV files for further analysis.
The fourth notebook analyzes the data collected from the experimental group.
It involves a detailed statistical analysis including
- Coefficient of Variation calculations,
- dot plot visualizations,
- Shapiro-Wilk tests,
- Levene's test,
- Kruskal-Wallis tests,
- and Wilcoxon Signed-Rank tests.
The goal is to examine the influence of length-defining keywords on response length.
The notebooks in this repository detail a study designed to answer the research question: "Does inserting length-defining keywords into a prompt have a significant impact on the response lengths produced by ChatGPT?"
Through a series of hypotheses and rigorous statistical testing, the study explores the effects of prompt templates and length-defining keywords on the length of responses generated by ChatGPT.
The research involves two categories of variables:
- Prompt templates (email, social media post, cover letter, essay, and explanation)
- Length-defining keywords (constant and template-specific)
These variables were used to design prompts for the ChatGPT model, and the responses were analyzed for length, variability, and how closely they aligned with the desired length indicated by the keywords.
If you're interested in AI text generation, and particularly in understanding how to control the length of the outputs generated by AI models like ChatGPT, this repository will be a valuable resource.
To get started with these notebooks, clone the repository, install the required dependencies listed in requirements.txt
, and run the notebooks in a Jupyter environment.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. Please make sure to update tests as appropriate.
For any questions or clarifications, feel free to reach out.
This research is intended to contribute to the broader scientific discourse around AI text generation and is not officially associated with OpenAI.