Date: 2023-10-19 10:00-11:10
Location: CSIE R103
Speaker: Dr. Nancy F. Chen
Host: Prof. Vivian Chen
We present SeaEval, a benchmark for multilingual foundation models. In addition to characterizing how these models understand and reason with natural language, we also investigate how well they comprehend cultural practices, nuances, and values. Alongside standard accuracy metrics, we examine the brittleness of foundation models in the dimensions of semantics and multilinguality. Our investigations encompasses both open-source and proprietary models, shedding light on their behavior in classic NLP tasks, reasoning, and cultural contexts. Notably, (1) Most models respond inconsistently to paraphrased instructions. (2) Exposure bias pervades, evident in both standard NLP tasks and cultural understanding. (3) For questions rooted in factual, scientific, or common sense knowledge, consistent responses are expected across multilingual queries that are semantically equivalent. Yet, many models intriguingly demonstrate inconsistent performance on such queries. (4) Models trained multilingually still lack ``balanced multilingual'' capabilities. Our endeavors underscore the need for more generalizable semantic representations and enhanced multilingual contextualization. SeaEval can serve as a launchpad for in-depth investigations for multilingual and multicultural evaluations.
Nancy F. Chen is an A*STAR fellow, senior principal scientist, principal investigator, and group leader at I2R (Institute for Infocomm Research) and Principal Investigator at CFAR (Centre for Frontier AI Research). Her group works on generative AI in speech, language, and conversational technology. Her research has been applied to education, defense, healthcare, and media/journalism. Dr. Chen has published 100+ papers and supervised 100+ students/staff. She has won awards from IEEE, Microsoft, NIH, P&G, UNESCO, L’Oréal, SIGDIAL, APSIPA, MICCAI. She is an IEEE SPS Distinguished Lecturer (2023-2024), Program Chair of ICLR 2023, Board Member of ISCA (2021-2025), and Singapore 100 Women in Tech (2021). Technology from her team has led to commercial spin-offs and government deployment. Prior to A*STAR, she worked at MIT Lincoln Lab while doing a PhD at MIT and Harvard. For more info: http://alum.mit.edu/www/