README.md
This repository is a project about the categorization of intransitive verbs. In language, an intransitive verb can be categorized as an unergative or an unaccusative verb, based on the grammatical and semantic similarity to the subject or to the object of a transitive verb. I investigate how the environment a verb appears in can affect the categorization, a claim that has relatively been understudied in the previous literature.
The word2vec model was trained with two types of training data: (1) new verbs occurred within 22 sentence constructions of Mandarin. These sentences were labeled as unergative/ neutral/ unaccusative, based on the grammatical and semantic properties found in previous studies that can possibly affect the categories of verbs within them (2) the language corpus from Taiwanese children in CHILDES (MacWhinney, 2000). In the end, I compare the semantic embeddings of new verbs with the two types of existing verbs (unergative or unaccusative verbs) in the corpus of Taiwanese children, using cosine similarity as the metric.
The results showed that new verbs exhibited closer cosine similarities to unergative verbs, when the new verbs occurred within constructions that have semantic and grammatical properties of unergative verbs. On the other hand, new verbs exhibited closer cosine similarities to unaccusative verbs when the new verbs occurred within constructions that are more favorable for unaccusative verbs. Verbs occurring in the other constructions show similarities between those two types of verbs. Moreover, more occurrences of new verbs within these 22 constructions can produce more solid categorization, than fewer occurrences. These findings altogether suggest that the constructions a verb appears in can eventually affect the categorization of intransitive verbs, and the categories of intransitive verbs should consider the constructional environment as well as their frequency within these constructions.