Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code not running as it is for st1 due to missing data file creation and wrong file paths #2

Open
ameliesc opened this issue Aug 16, 2024 · 1 comment

Comments

@ameliesc
Copy link

Currently it is not possible to run this code as there are several bugs.

  1. train.py does not work in sst1 for both adapters and transformers as datafiles are loaded which do not exist and are not created anywhere:

train_val_df = load_df('../../data/st1_joined_data/training.tsv')

test_df = load_df('../../data/st1_joined_data/dev.tsv')

training_set=pd.read_csv("./data/cleaned_original_train.tsv", skiprows=1,

satire_en=pd.read_csv("./data/satire_external_en.tsv", skiprows=1,

...

  1. Readme filepath is wrong.

The ReadME for st1 suggest to extract files into ../data/articles/external_satire but process_external_satire.py refers to ../data/external_satire

satire_files = glob.glob('../data/ext_satire/*.txt')

I suggest cloning this repo and trying to run train.py with the instructions given here.

@freddyheppell
Copy link
Member

Hi, thanks for bringing these issues to our attention.

For now, as far as I remember:

  • the TSVs in the st1_joined_data dir are a data frame of all the article texts along with their ID, language and label.
  • The satire_external_en file can be produced by using the process_external_satire script to load the data and output a single file.
  • the cleaned_original_en file is a result of running the functions in clean_text over the files

But we’ll confirm all that and add the missing code as soon as we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants