Development of Learner Writing by Proficiency: Automatic measurement of text similarity through Word2Vec
Lecture by Boo Kyung Jung, Instructor, Department of East Asian Languages and Literatures
Boo Kyung Jung is a Korean language instructor in the Department of East Asian Languages and Literatures at the University of Pittsburgh. Her research interests include corpus linguistics and Korean pedagogy.
The present study investigates to what degree learner writing is similar to (or distant from) native speakers’ writing by proficiency. For this purpose, the study adopts topic modelling, a Natural Language Processing approach to detecting hidden topics from large volumes of text in an unsupervised manner. In particular, the study calculated lexical/semantic similarity between learner writing and native speakers’ writing automatically. 36 Chinese-speaking learners and 10 native speakers of Korean were asked to write argumentative essays about two topics separately. Text similarity of each proficiency group’ essays with reference to native speakers’ essays was calculated by using a Word2Vec model through Gensim (Rehurek & Sojka, 2010). The study used pre-trained embedding for this task. Results showed that the quality of learner writing approximated to that of native speakers’ writing as proficiency increased, which indicates development of L2 written production in light of lexical/semantic features.
Rehurek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. In Proceedings of LREC 2010 Workshop New Challenges for NLP Frameworks, 45-50.
Location and Address