Title: On Training Cross-Language Neural Information Retrieval Models
Location: CSIE R110
Speaker: Dr. Eugene Yang, Johns Hopkins University
Host: Prof. Yun-Nung Chen
Neural retrieval models have pushed the state-of-the-art retrieval effectiveness to another level in the last five years with the help of pretrained language models and large-scale training data, such as MS MARCO, consisting of millions of labeled query-passage pairs. Since they are primarily in English, monolingual English retrieval models benefit from utilizing such collections. However, because of language mismatches, creating neural retrieval models for searching documents with queries in a different language (cross-language information retrieval) cannot leverage such resources. In this talk, we will discuss the state-of-the-art training approaches and open challenges for neural cross-language information retrieval.
Eugene Yang is a visiting research scientist at the Human Language Technology Center of Excellence (HLTCOE) at Johns Hopkins University. His recent works focus on multilingual and cross-language information retrieval with particular interests in training approaches and retrieval efficiency analysis. Eugene also co-organizes the TREC NeuCLIR track since 2022. Before joining HLTCOE, Eugene received his Ph.D. from Georgetown University, where he worked on High Recall retrieval for electronic Discovery with Ophir Frieder and David D. Lewis.