How to get a complexity metric for a new dataset? #1

msclar · 2024-11-14T23:18:26Z

Thank you for the great work and for releasing the code!

If we wanted to compute the complexity for a new dataset, what would be the steps to do so?

I see that data/<dataset>/splits.json already has the num_states and num_highlights. For the dataset I'm interested in, I solely have the prompt, question, & answer. Once I populate this file correctly, what parameter choices would be best to report?

Would it be correct to say that after making these modifications, bash script/gpt-3.5.sh should yield the results I need or am I missing anything?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

Flecart · 2024-11-15T14:53:07Z

Hi Melanie!
Thank you for reaching out!

As described in section 4.2 of our work, we labelled the datasets manually 50 random samples from the dataset to retrieve the statefulness values.
We made a small applet to facilitate the labelling process. I have just created a quick demo for you!
Screencast from 2024-11-15 15-32-20.webm

The applet is located in this directory: https://github.com/Flecart/complexity-tom-dwm/tree/main/statefulness/app.
You should run python3 server.py and connect to localhost at port 8000 to see the interface you see in the video.
Then, to create a state you should highlight the sentence or part of it. To remove a state you should click the highlighted text.

I strongly suggest to serialize the data for the applet into the Schema described at this line, the applet might not work if the input json doesn't have that format, I have not tested this scenario.

Having prompt, question and answer is all you need to create the labelled data!

I have looked at the parameter we used in our work. We used $\tau$ as 0.2 for every dataset. So that's the suggested parameter choice.

And bash script/gpt-3.5.sh is the script for the accuracy result for the prompting method we proposed, not for the complexity metric!
For the complexity metric we ran the script in: https://github.com/Flecart/complexity-tom-dwm/blob/main/statefulness/copy_state_data.py.
This will print out the stateful and stateless values for each sample in the data.

Then, in the report we did something similar to the following:

# paste the output result for stateful value
tomi = np.array([1, 1, 1, 4, 3, 1, 5, 5, 1, 3, 3, 1, 5, 4, 4, 1, 1, 4, 4, 1, 2, 6, 1, 2, 1, 1, 3, 5, 3, 1, 5, 6, 4, 1, 1, 5, 3, 5, 1, 1, 1, 5, 1, 1, 1, 3, 1, 3, 4, 3], dtype=float)

# paste the stateless value
tau = 0.2
tomi += tau*np.array([8, 5, 7, 5, 1, 3, 4, 2, 3, 6, 3, 2, 7, 2, 3, 1, 4, 5, 3, 3, 6, 6, 1, 8, 6, 7, 6, 6, 3, 2, 7, 2, 0, 4, 7, 4, 2, 5, 2, 5, 6, 5, 1, 5, 8, 5, 5, 7, 4, 4])

Then, we used boxplots to plot the results.

If you need further assitance, feel free to reach out!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get a complexity metric for a new dataset? #1

How to get a complexity metric for a new dataset? #1

msclar commented Nov 14, 2024

Flecart commented Nov 15, 2024 •

edited

Loading

How to get a complexity metric for a new dataset? #1

How to get a complexity metric for a new dataset? #1

Comments

msclar commented Nov 14, 2024

Flecart commented Nov 15, 2024 • edited Loading

Flecart commented Nov 15, 2024 •

edited

Loading