CopCo Eye-Tracking Data Processing

This repository contains the code to analyze and post-process the eye-tracking data of the CopCo corpus, which is described in the following publication:

Nora Hollenstein, Maria Barrett, and Marina Björnsdóttir. 2022. The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1712–1720, Marseille, France. European Language Resources Association.

Please make sure to read about the data format and download the latest version of the data from the OSF repository.

Participant statistics

python participant_statistics.py RawData/
This script includes calculation of comprehension scores and overall reading times. Raw Data should include one folder per participant containing the respective EDF recording file.

python calibration_check.py
This script checks the calibration accuracy of all participants.

Dataset statistics

python texts_statistics.py
This script includes calculation of text and sentence length.

Feature extraction from fixation and interest area reports

The extracted features can be found in ExtractedFeatures/, but if required you can also re-run the code to add additional features or modify the existing ones:

Use the DataViewer software from SR Research to convert the recorded EDF files to fixation reports and interest area reports in TXT format.
If the reports were exported in utf-8 encoding (this can be speficied in the DataViewer preferences), this step can be skipped.
Convert SR DataViewer output files to UTF-8 for correct representation of Danish special characters:
iconv -f ISO-8859-1 -t UTF-8 FIX_report_P10.txt > FIX_report_P10-utf8.txt
iconv -f ISO-8859-1 -t UTF-8 IA_report_P10.txt > IA_report_P10-utf8.txt

These files are also available in the OSF repository (original and UTF-8 versions) in the folders FixationReports and InterestAreaReports.

Create a mapping from character interest areas to word interest areas:
python char2word_mapping.py
This step is only required if there were changes in the experiment setup. In order to complete this step it is necessary to deploy the experiment in the SR ExperimentBuilder twice, once with automatic segmentation of text into individual characters as areas of interest, and once with word segmentation as areas of interest. The required files are provided in aois/. The script char2word_mapping.py will align the characters to the words.
Extract word-level and character-level features with python extract_features.py
This outputs new CSV files in ExtractedFeatures/.
These files can also be downloaded directly from the OSF repository.

A description of the extracted features can be found here.

Data validation

Use the script validate_data.py to check the data quality, e.g., word length effect and landing position analysis.

Use the script participant_correlations.py to calculate correlations between different participants characteristics.

Dyslexia classification

This directory contains the code used for the dyslexica classification methods described in the following publication:

Marina Björnsdóttir, Nora Hollenstein, and Maria Barrett, 2023. Dyslexia Prediction from Natural Reading of Danish Texts. In Proceedings of the 24th Nordic Conference on Computational Linguistics, pages 1712–1720, Torshavn, Faroe Islands.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
DyslexiaClassification		DyslexiaClassification
ExtractedFeatures		ExtractedFeatures
FixationReports		FixationReports
InterestAreaReports		InterestAreaReports
RawData		RawData
aois		aois
plots		plots
utils		utils
.gitignore		.gitignore
README.md		README.md
calibration_check.py		calibration_check.py
char2word_mapping.py		char2word_mapping.py
extract_features.py		extract_features.py
helpers.py		helpers.py
participant_correlations.py		participant_correlations.py
participant_statistics.py		participant_statistics.py
participant_stats.csv		participant_stats.csv
requirements.txt		requirements.txt
speech_statistics.py		speech_statistics.py
speech_stats.csv		speech_stats.csv
test_sent_splitting.py		test_sent_splitting.py
texts_statistics.py		texts_statistics.py
texts_stats.csv		texts_stats.csv
validate_data.py		validate_data.py
word2char_IA_mapping.csv		word2char_IA_mapping.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CopCo Eye-Tracking Data Processing

Participant statistics

Dataset statistics

Feature extraction from fixation and interest area reports

Data validation

Dyslexia classification

About

Releases

Packages

Contributors 3

Languages

norahollenstein/copco-processing

Folders and files

Latest commit

History

Repository files navigation

CopCo Eye-Tracking Data Processing

Participant statistics

Dataset statistics

Feature extraction from fixation and interest area reports

Data validation

Dyslexia classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages