-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get a complexity metric for a new dataset? #1
Comments
Hi Melanie! As described in section 4.2 of our work, we labelled the datasets manually 50 random samples from the dataset to retrieve the statefulness values. The applet is located in this directory: https://github.com/Flecart/complexity-tom-dwm/tree/main/statefulness/app. I strongly suggest to serialize the data for the applet into the Schema described at this line, the applet might not work if the input json doesn't have that format, I have not tested this scenario. Having prompt, question and answer is all you need to create the labelled data! I have looked at the parameter we used in our work. We used And Then, in the report we did something similar to the following: # paste the output result for stateful value
tomi = np.array([1, 1, 1, 4, 3, 1, 5, 5, 1, 3, 3, 1, 5, 4, 4, 1, 1, 4, 4, 1, 2, 6, 1, 2, 1, 1, 3, 5, 3, 1, 5, 6, 4, 1, 1, 5, 3, 5, 1, 1, 1, 5, 1, 1, 1, 3, 1, 3, 4, 3], dtype=float)
# paste the stateless value
tau = 0.2
tomi += tau*np.array([8, 5, 7, 5, 1, 3, 4, 2, 3, 6, 3, 2, 7, 2, 3, 1, 4, 5, 3, 3, 6, 6, 1, 8, 6, 7, 6, 6, 3, 2, 7, 2, 0, 4, 7, 4, 2, 5, 2, 5, 6, 5, 1, 5, 8, 5, 5, 7, 4, 4]) Then, we used boxplots to plot the results. If you need further assitance, feel free to reach out! |
Thank you for the great work and for releasing the code!
If we wanted to compute the complexity for a new dataset, what would be the steps to do so?
I see that
data/<dataset>/splits.json
already has thenum_states
andnum_highlights
. For the dataset I'm interested in, I solely have the prompt, question, & answer. Once I populate this file correctly, what parameter choices would be best to report?Would it be correct to say that after making these modifications,
bash script/gpt-3.5.sh
should yield the results I need or am I missing anything?Thanks in advance!
The text was updated successfully, but these errors were encountered: