GitHub

about the prediction data

At /GoodJudgeOpen_crawler

There is a readme.md and a pipeline.sh.

You can crawl the data incrementally using playwright by yourself, then you can update the current gjo file from crawler.

base_path="./"
current_gjo_file_path="${base_path}test/raw_question/gjo_transformed_questions_no_output.jsonl"
gjo_raw_crawl_file_path="${base_path}GoodJudgeOpen_crawler/gjo.json"

python update_from_crawl.py \
    --gjo_raw_crawl_file_path "$gjo_raw_crawl_file_path" \
    --current_gjo_file_path "$current_gjo_file_path"

This updates the gjo questions in gjo_raw_crawl_file_path: by default test/raw_question/gjo_transformed_questions_no_output.jsonl

Letting models answer and judge the choice incrementally

online:

fill /script/base-url.txt and /script/api-key.txt

fill the model names in pipeline_api.sh

bash ./pipeline_api.sh

local:

fill the model names in pipeline_local.sh

bash ./pipeline_local.sh

answer will be at /answer_csv/<model_name>.csv

classify Nostalgia and Neophilia

specify the periods in analysis/acc_p1p2.py and analysis/acc_p1p2_T2.py, they test the two hyposis, make sure the time_intervals_dic are the same, [(-80, -60), (-20, 0)] means comparing the accuracy during 80 to 60 months before model release and accuracy during the 20 months before release.

time_intervals_dic = {  
                      '[(-80, -60), (-20, 0)]':[(-80, -60), (-20, 0)],
                      '[(-60, -40), (-20, 0)]':[(-60, -40), (-20, 0)],
                      '[(-40, -20), (-20, 0)]':[(-40, -20), (-20, 0)],
                    }

python analysis/acc_p1p2.py
python analysis/acc_p1p2_T2.py
python analysis/classify.py

classify will generate a latex table

gpt4_1106 & Nostalgia *** & Nostalgia *** & Nostalgia *** \\
gpt_3.5_turbo_0613 & Balanced & Nostalgia *** & Nostalgia *** \\

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
GoodJudgeOpen_crawler		GoodJudgeOpen_crawler
answer_csv		answer_csv
logs		logs
rag		rag
script		script
test		test
.gitignore		.gitignore
0330_merge_close_source.csv		0330_merge_close_source.csv
error.log		error.log
handel_latex_table.py		handel_latex_table.py
merge_0603.csv		merge_0603.csv
merge_close_source copy0526.csv		merge_close_source copy0526.csv
merge_close_source.csv		merge_close_source.csv
model_performance_0603_T1.csv		model_performance_0603_T1.csv
model_performance_0603_T1.png		model_performance_0603_T1.png
model_performance_0603_T2.csv		model_performance_0603_T2.csv
model_performance_0603_T2.png		model_performance_0603_T2.png
model_performance_analysis.csv		model_performance_analysis.csv
pipeline_api.sh		pipeline_api.sh
pipeline_local.sh		pipeline_local.sh
readme.md		readme.md
reorder.py		reorder.py
standard_name.py		standard_name.py
update_from_crawl.py		update_from_crawl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

about the prediction data

Letting models answer and judge the choice incrementally

online:

local:

answer will be at /answer_csv/<model_name>.csv

classify Nostalgia and Neophilia

classify will generate a latex table

About

Releases

Packages

Contributors 4

Languages

FreedomIntelligence/FreshBench

Folders and files

Latest commit

History

Repository files navigation

about the prediction data

Letting models answer and judge the choice incrementally

online:

local:

answer will be at /answer_csv/<model_name>.csv

classify Nostalgia and Neophilia

classify will generate a latex table

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages