Skip to content

Latest commit

 

History

History
127 lines (112 loc) · 1.84 KB

benchmark_results_11_3.md

File metadata and controls

127 lines (112 loc) · 1.84 KB

Benchmark Evaluation Report

Date: 2024-11-03 12:07:57

Overall Results

Total Questions: 68

CSIL Performance

Correct Answers: 17 Accuracy: 25.0% Average Time: 81.75s

Baseline Performance

Correct Answers: 13 Accuracy: 19.1% Average Time: 9.72s

Results by Category

Creative and Abstract Thinking

CSIL

Questions: 7 Accuracy: 0.0% Average Time: 113.38s

Baseline

Questions: 7 Accuracy: 0.0% Average Time: 12.61s

Linguistic

CSIL

Questions: 3 Accuracy: 33.3% Average Time: 25.79s

Baseline

Questions: 3 Accuracy: 0.0% Average Time: 2.11s

Layered Understanding

CSIL

Questions: 8 Accuracy: 12.5% Average Time: 122.28s

Baseline

Questions: 8 Accuracy: 0.0% Average Time: 20.07s

Context Integration

CSIL

Questions: 8 Accuracy: 0.0% Average Time: 92.90s

Baseline

Questions: 8 Accuracy: 0.0% Average Time: 10.75s

Reasoning and Inference

CSIL

Questions: 7 Accuracy: 57.1% Average Time: 73.33s

Baseline

Questions: 7 Accuracy: 42.9% Average Time: 10.64s

Puzzle

CSIL

Questions: 12 Accuracy: 58.3% Average Time: 75.52s

Baseline

Questions: 12 Accuracy: 50.0% Average Time: 7.60s

Relational

CSIL

Questions: 3 Accuracy: 33.3% Average Time: 51.56s

Baseline

Questions: 3 Accuracy: 33.3% Average Time: 5.66s

Popular science

CSIL

Questions: 3 Accuracy: 0.0% Average Time: 67.71s

Baseline

Questions: 3 Accuracy: 33.3% Average Time: 11.10s

Spatial

CSIL

Questions: 7 Accuracy: 14.3% Average Time: 59.14s

Baseline

Questions: 7 Accuracy: 28.6% Average Time: 6.57s

Emotional and Social Intelligence

CSIL

Questions: 8 Accuracy: 0.0% Average Time: 86.23s

Baseline

Questions: 8 Accuracy: 0.0% Average Time: 6.65s

Counting

CSIL

Questions: 2 Accuracy: 100.0% Average Time: 42.57s

Baseline

Questions: 2 Accuracy: 0.0% Average Time: 2.40s