Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get a complexity metric for a new dataset? #1

Open
msclar opened this issue Nov 14, 2024 · 1 comment
Open

How to get a complexity metric for a new dataset? #1

msclar opened this issue Nov 14, 2024 · 1 comment

Comments

@msclar
Copy link

msclar commented Nov 14, 2024

Thank you for the great work and for releasing the code!

If we wanted to compute the complexity for a new dataset, what would be the steps to do so?

I see that data/<dataset>/splits.json already has the num_states and num_highlights. For the dataset I'm interested in, I solely have the prompt, question, & answer. Once I populate this file correctly, what parameter choices would be best to report?

Would it be correct to say that after making these modifications, bash script/gpt-3.5.sh should yield the results I need or am I missing anything?

Thanks in advance!

@Flecart
Copy link
Owner

Flecart commented Nov 15, 2024

Hi Melanie!
Thank you for reaching out!

As described in section 4.2 of our work, we labelled the datasets manually 50 random samples from the dataset to retrieve the statefulness values.
We made a small applet to facilitate the labelling process. I have just created a quick demo for you!
Screencast from 2024-11-15 15-32-20.webm

The applet is located in this directory: https://github.com/Flecart/complexity-tom-dwm/tree/main/statefulness/app.
You should run python3 server.py and connect to localhost at port 8000 to see the interface you see in the video.
Then, to create a state you should highlight the sentence or part of it. To remove a state you should click the highlighted text.

I strongly suggest to serialize the data for the applet into the Schema described at this line, the applet might not work if the input json doesn't have that format, I have not tested this scenario.

Having prompt, question and answer is all you need to create the labelled data!

I have looked at the parameter we used in our work. We used $\tau$ as 0.2 for every dataset. So that's the suggested parameter choice.

And bash script/gpt-3.5.sh is the script for the accuracy result for the prompting method we proposed, not for the complexity metric!
For the complexity metric we ran the script in: https://github.com/Flecart/complexity-tom-dwm/blob/main/statefulness/copy_state_data.py.
This will print out the stateful and stateless values for each sample in the data.

Then, in the report we did something similar to the following:

# paste the output result for stateful value
tomi = np.array([1, 1, 1, 4, 3, 1, 5, 5, 1, 3, 3, 1, 5, 4, 4, 1, 1, 4, 4, 1, 2, 6, 1, 2, 1, 1, 3, 5, 3, 1, 5, 6, 4, 1, 1, 5, 3, 5, 1, 1, 1, 5, 1, 1, 1, 3, 1, 3, 4, 3], dtype=float)

# paste the stateless value
tau = 0.2
tomi += tau*np.array([8, 5, 7, 5, 1, 3, 4, 2, 3, 6, 3, 2, 7, 2, 3, 1, 4, 5, 3, 3, 6, 6, 1, 8, 6, 7, 6, 6, 3, 2, 7, 2, 0, 4, 7, 4, 2, 5, 2, 5, 6, 5, 1, 5, 8, 5, 5, 7, 4, 4])

Then, we used boxplots to plot the results.

If you need further assitance, feel free to reach out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants