Pretraining Datasets #3

superctj · 2023-04-24T23:51:11Z

Thank you for open-sourcing the code! I didn't find descriptions about pretraining datasets in the paper. Was Starmie pertained on benchmark datasets?

jw-megagon · 2023-06-09T20:30:17Z

Sorry for the late reply, we use the Viznet tables for pre-training the column encoder which can be found in this page: https://github.com/megagonlabs/sato/tree/master/table_data

IbraheemTaha · 2023-11-29T13:14:28Z

Thanks a lot for your sourcing the code and your answer @jw-megagon. I was wondering did you train the model on all tables of VisNet (80000) or you used the multi-column sets only? Moreover, could you please provide the hyperparameters (--batch_size, --lr --lm, --n_epochs , --max_len , --size, --projector, --augment_op, --sample_meth, --table_order) you used in the training process?

Thanks in advance!

Kirito-Aus · 2024-01-17T09:15:21Z

To obtain the training data for Viznet, I saved all the tables from the folders within the viznet_tables/webtableX/KX_multi-col directory at https://github.com/megagonlabs/sato/tree/master/table_data. These tables were then stored in the /data/viznet/tables folder of the project, and I also simplified their names to make them more concise. Could you please confirm if my actions were correct?

Kirito-Aus · 2024-01-17T09:17:46Z

I have obtained the data/viznet/tables through the steps mentioned above. However, during the pretrain process, the file data/viznet/test.csv is required (line 284 in pretrain.py). Can you please tell me where I can obtain this file?

Thanks in advance! : )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretraining Datasets #3

Pretraining Datasets #3

superctj commented Apr 24, 2023

jw-megagon commented Jun 9, 2023

IbraheemTaha commented Nov 29, 2023

Kirito-Aus commented Jan 17, 2024

Kirito-Aus commented Jan 17, 2024

Pretraining Datasets #3

Pretraining Datasets #3

Comments

superctj commented Apr 24, 2023

jw-megagon commented Jun 9, 2023

IbraheemTaha commented Nov 29, 2023

Kirito-Aus commented Jan 17, 2024

Kirito-Aus commented Jan 17, 2024