Hi! I’m Arya McCarthy. I am a computer scientist, musician, cyclist, runner, and world politics aficionado. Expatriated from Texas, I’ve had the good fortune to wander. A constant drive for me is to make the communities we’re a part of healthier, effective, and welcoming. I’m convinced that the impossible, the improbable, and the inevitable are separated by your grit, and I hope to bridge the new digital divide through understanding humans and their languages.
I am a Senior Research Scientist at Noetica at the intersection of fintech and legal tech, working on NLP tools used by some of the top law firms in the world. Previously, I was a research scientist at Scaled Cognition, where I developed rational, controllable AI models for high-trust scenarios. I earned my Ph.D. while introducing on structure-grounded translation and morphology techniques for 1,000+ languages. I also interned and published at Google, Duolingo, and Facebook. I invite you to explore my publications (i.e., 41 research papers and a book).
News
- October 2023. Ph.Done! I defended and submitted my dissertation, Structured Analysis and Translation of Thousands of Languages.
- October 2023. Check out our new Findings of EMNLP paper: “Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models”
- July 2023. Our paper earned honorable mention for best paper at ACL 2023: “Theory-Grounded Computational Text Analysis”.
- May 2023. Our book on NLP methods in social science is published: “A Free Press, If You Can Keep It: What Natural Language Processing Reveals About Freedom of the Press in Hong Kong”.
- March 2023. Led panel discussion on integrating textual and non-textual data at MIT.
- February 2023. Led panel discussion on LLMs and the future of data at UC Berkeley.
- February 2023. Gave a talk at Columbia University on translation at the scale of 1000+ languages.
- January 2023. Check out our new EACL paper: “Meeting the needs of low-resource languages: The value of automatic alignments via pretrained models”
- December 2022. Gave a talk at the Allen Institute for AI (AI2).
- October 2022. Check out our new EMNLP 2022 paper: “A Major Obstacle for NLP Research: Let’s Talk about Time Allocation!”
- October 2022. Presented “Deciphering and characterizing out-of-vocabulary words in morphologically rich languages” at COLING 2022 in Korea.
- September 2022. Gave a talk at UC Berkeley.
- September 2022. Gave a talk at Stanford University.
Music
I’ve played the bagpipe for over a decade. These days, it’s a great way to social-distance. I also typeset original bagpipe compositions in LaTeX, which is criminally underestimated as a tool for bringing beauty into the world.
Academic
I completed my Ph.D. at Johns Hopkins University, designing machine translation that uses panlingual weak supervision with David Yarowsky in JHU’s LoReLab. I was an Amazon Fellow and the 2022–2023 Frederick Jelinek Fellow. I graduated from SMU in 2017 with a bachelor’s in mathematics and computer science and a master’s in computer science. There, I worked with David Matula on convex optimization, graph theory, and number theory. Along the way, I studied at Stanford University and the University of Edinburgh.
Selected publications:
- On the uncomputability of partition functions in energy-based sequence models with Chu-Cheng Lin. ICLR 2022 Spotlight.
- Addressing posterior collapse with mutual information for improved variational neural machine translation with Xian Li, Jiatao Gu, and Ning Dong. ACL 2020.
- Modeling color terminology across thousands of languages with Winston Wu, Aaron Mueller, William Watson, and David Yarowsky. EMNLP 2019.
Beauty
I can’t feel anything but gratitude for every single moment of my stupid little life. Friends and strangers on trains have shared their tenderness with me. Whether clinging to scaffolding in bell towers, sloshing for miles through stormwater drains, mountainside sunrises in New Mexico, or jumping over filched restaurant candles for Charshanbe Suri, the world finds a way to rekindle the creative spark.
For fellow graduate students, I encourage you to do one thing when you travel to conferences. Book a few extra days if you can afford it, push back your return flight, and take in the area’s UNESCO World Heritage Sites and museums.