Titile: Recent Results on Video Understanding and Generation via Multimodal Foundation Models
Date: 2024/05/29 2:30pm-3:30 pm
Location: CSIE 104
Speaker: Prof. Ming-Hsuan Yang, UC Merced
Host: Prof. Yung-Yu Chuang
Abstract:
Recent years have witnessed significant advances in vision and language models for various visual tasks, including understanding and generation. In this talk, I will present our recent results on exploiting large vision and language models for video understanding and generation. I will describe our recent work on foundation models for visual classification, video-text retrieval, visual caption, visualquery answering, visual grounding, video generation, stylization, outpainting, and video-to-audio tasks.
Biography:
Ming-Hsuan Yang is a Professor at UC Merced and a Research Scientist with Google. He received the Google Faculty Award in 2009 and CAREER Award from the National Science Foundation in 2012. Yang received paper awards at UIST 2017, CVPR 2018, ACCV 2018, and Longuet-Higgins Prize in CVPR 2023. He is an Associate Editor-in-Chief of PAMI and Associate Editor of IJCV. He was the Editor-in-Chief of CVIU and program co-chair of ICCV 2019. Yang served as the Program Chair for ACCV 2014 and ICCV 2019 and Senior Area Chair/Area Chair for CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML,IJCAI, and AAAI. Yang is a Fellow of the IEEE and ACM.