Event_Knowledge_Model_Comparison

Model scores for the paper "Event knowledge in large language models: the gap between the impossible and the unlikely".

The main analysis repo: https://github.com/carina-kauf/lm-event-knowledge

MODELS

I. Large language models (LLMs)

We tested four attention-based Transformer (Vaswani et al., 2017) language models:

RoBERTa (Liu et al., 2019)
BERT (Devlin et al., 2018)
GPT-J (B. Wang & Komatsuzaki, 2021)
GPT-2 (Radford et al., 2019)

In the script names and result files, we use the name ANN instead of LLM.

I.I Score calculation for RoBERTa & BERT

Main metric: Adapted Pseudo-log-likelihood (PLL)

We use a modified version of the sentence’s pseudo-log-likelihood under the model (PLL; Salazar et al., 2020; A. Wang & Cho, 2019), which defines the sentence score as the sum of the log-probabilities of each token given all other tokens. To avoid biasing the scores in favor of multi-token lexical items, we modify the original procedure to additionally mask tokens within multi-token words if they are located to the right of the target.

associated script at: ANN_MLM_adapted.py

Secondary metrics

PLL (Salazar et al., 2020)
Verb probability, i.e., the average log-likelihood of the verb’s tokens v = w_t ...w_t' conditioned on their bidirectional sentence context
Last-word probability, i.e., the average log-likelihood of the subtokens that compose the last word in the sequence according to the model’s tokenizer
Left-to-right (l2r), causal sentence-generation probability, i.e., average log-likelihood for each token w_i in the sequence, conditioned on only the preceding tokens w_<i according to the model.

associated script at: ANN_MLM_scores.py

I.II Score calculation for GPT-J & GPT-2

We define the sentence score as the sum of the log-probabilities of each token w_i in the sequence, conditioned on the preceding sentence tokens w_<i}.

associated script at: ANN_GPT2_scores.py

II. Baseline models

tinyLSTM (Gauthier et al., 2020): computes the surprisal of a sentence as the sum of the surprisals of each token in the sentence

associated script at: baseline_lmzoo_tinylstm.py

thematic Fit: computes the fit of a patient with the prototype representation of that role, considering the patient-role fillers most associated with the agent AND the predicate of the sentence (following Lenci, 2011).

Procedure:
1. we retrieve the N most strongly associated objects for the subject and the verb respectively, and we take the intersection of the two lists;
2. we update their association scores using either the product (prod) function;
3. we select the FastText embeddings corresponding to the first M objects in this list and we average them together (centroid) to create the prototype vector of the object given the subject and the verb;
4. the thematic fit of the object x with respect to the other items in the sentence is computed as the similarity score of its corresponding lexical vector v(x) with the prototype vector.
To avoid zero scores, we apply the following methodology in case the intersection of fillers is empty:
- in the two lists are not empty, we use verb's fillers to create the prototype;
- if one list is empty, we take the other one.

associated script at: baseline_TF-update.ipynb

Structured Distributional Model (SDM; Chersoni et al., 2019): computes a thematic fit that computes both a context-independent and a context-dependent representation of the prototype role filler based on the current linguistic context.

associated script available upon request

PPMI-syntax (structured input, input annotated with grammatical roles)

After extracting triples < verbal head, nominal dependent, relation > from the corpora (with a frequency >= 2), we compute the PPMI as follows (N= total frequency of all triples).

associated script at: baseline_PPMI_structured_and_unstructured.ipynb
NOTE: frequency files can be found here: drive_folder

DATASETS (name aliases)

Dataset 1 - EventsAdapt (based on Fedorenko et al, 2020) : newsentences_EventsAdapt

Dataset 2 - DTFit (based on Vassallo et al, 2018) : DTFit

Dataset 3 - EventsRev (based on Ivanova et al, 2021) : ev1

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
datasets		datasets
img		img
relations4TFupdate		relations4TFupdate
results		results
tinylstm_files		tinylstm_files
.gitignore		.gitignore
ANN_GPT2_scores.py		ANN_GPT2_scores.py
ANN_MLM_adapted.py		ANN_MLM_adapted.py
ANN_MLM_scores.py		ANN_MLM_scores.py
README.md		README.md
baseline_PPMI_structured_and_unstructured.ipynb		baseline_PPMI_structured_and_unstructured.ipynb
baseline_TF-update.ipynb		baseline_TF-update.ipynb
baseline_lmzoo_tinylstm.py		baseline_lmzoo_tinylstm.py
requirements_ANNs.txt		requirements_ANNs.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Event_Knowledge_Model_Comparison

MODELS

I. Large language models (LLMs)

I.I Score calculation for RoBERTa & BERT

I.II Score calculation for GPT-J & GPT-2

II. Baseline models

DATASETS (name aliases)

About

Releases

Packages

Contributors 2

Languages

giuliarambelli/Event_Knowledge_Model_Comparison

Folders and files

Latest commit

History

Repository files navigation

Event_Knowledge_Model_Comparison

MODELS

I. Large language models (LLMs)

I.I Score calculation for RoBERTa & BERT

I.II Score calculation for GPT-J & GPT-2

II. Baseline models

DATASETS (name aliases)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages