Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretraining Datasets #3

Open
superctj opened this issue Apr 24, 2023 · 4 comments
Open

Pretraining Datasets #3

superctj opened this issue Apr 24, 2023 · 4 comments

Comments

@superctj
Copy link

Thank you for open-sourcing the code! I didn't find descriptions about pretraining datasets in the paper. Was Starmie pertained on benchmark datasets?

@jw-megagon
Copy link
Collaborator

Sorry for the late reply, we use the Viznet tables for pre-training the column encoder which can be found in this page: https://github.com/megagonlabs/sato/tree/master/table_data

@IbraheemTaha
Copy link

Thanks a lot for your sourcing the code and your answer @jw-megagon. I was wondering did you train the model on all tables of VisNet (80000) or you used the multi-column sets only? Moreover, could you please provide the hyperparameters (--batch_size, --lr --lm, --n_epochs , --max_len , --size, --projector, --augment_op, --sample_meth, --table_order) you used in the training process?

Thanks in advance!

@Kirito-Aus
Copy link

To obtain the training data for Viznet, I saved all the tables from the folders within the viznet_tables/webtableX/KX_multi-col directory at https://github.com/megagonlabs/sato/tree/master/table_data. These tables were then stored in the /data/viznet/tables folder of the project, and I also simplified their names to make them more concise. Could you please confirm if my actions were correct?

@Kirito-Aus
Copy link

I have obtained the data/viznet/tables through the steps mentioned above. However, during the pretrain process, the file data/viznet/test.csv is required (line 284 in pretrain.py). Can you please tell me where I can obtain this file?

Thanks in advance! : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants