Undergraduate Research & Scholarships

Josh Barua L&S Math & Physical Sciences

When is a Pen a Boli vs Pluma: Mining Lexical Rules with Neural Models

The emergence of unique semantic subdivisions of concepts across languages is a natural byproduct of cultural, geographic, and historical factors. One example of concept variation is the choice between “boli” and “pluma” when translating “pen” in Spanish. While both loosely translate to pen, boli typically refers to a ballpoint pen whereas pluma refers to a fountain pen (or even a quill in historical contexts). For non-native speakers, learning these subtle lexical rules that govern which translation to use can prove challenging without expert help. For my research project, we aim to use neural models to automatically identify instances of concept variation and provide interpretable rules that explain their distinction in usage. With this information, we will create a freely-accessible digital resource that can assist language learners in their journey. Novel insights into concept variation can also be used to improve systems in human-computer interaction, machine translation, and pragmatics that frequently struggle in cross-cultural settings.

Message To Sponsor

I would like to sincerely thank CACSSF for sponsoring my research this summer. With your funding, we were able to work closely with native speakers in 9 low-resource languages and collect a dataset for fine-grained analysis of the multilingual capabilities of language models. With this resource and our subsequent analysis, we were able to write a research paper that has received positive feedback from the top conference in our field (fingers crossed for an acceptance!). This has experience has reinforced my passion for research and I look forward to applying to PhD programs in machine learning this fall.
Profile image of Josh Barua
Major: Computer Science
Mentor: Alane Suhr
Sponsor: CACSSF
Back to Listings
Back to Donor Reports