[2024-04-19] Dr. Wan-Duo Kurt Ma, Weta and Victoria University of Wellington,” Diffusion-model Unleashed: Bridging Theory to Practice with Enhanced Controllability”

  • 2024-04-02
  • HSIN-YI SUNG
Title: Diffusion-model Unleashed: Bridging Theory to Practice with Enhanced Controllability 
Date: 2024/04/19 14:20-15:30
Location:  
CSIE  R103
Speaker:
 Dr. Wan-Duo Kurt Ma, Weta and Victoria University of Wellington

Host: Prof. Yung-Yu Chuang

Abstract:
The progress in stable diffusion models for text-to-image and text-to-video tasks has been remarkable, allowing for the automatic generation of images and videos from textual prompts. However, despite
the success of this synthesis approach, there is a recognized limitation in the controllability of the generated subjects. Additionally, the extensive training data and memory requirements associated with these models increase the cost, and further inspire the exploration of innovations using zero/few-shot learning, online optimization, and fine-tuning methods.

This presentation provides an overview of various aspects regarding diffusion models. Initially, we explore the diffusion models from theoretical foundations to practical applications. Next, we examine its limitations and explore the proposed methods to enhance controllability for image and video generation. Finally, we demonstrate practical approaches for efficiently implementing and studying diffusion research.



Biography:

Wan-Duo Kurt Ma currently holds the position of a machine learning-based researcher at Wētā FX and serves as a research fellow at Victoria University of Wellington, both located in Wellington, New Zealand. He completed his PhD degree at Victoria University of Wellington in 2022 under the supervision of Dr. J.P. Lewis and Prof. W. Bastiaan Kleijn. Presently, Kurt continues to actively engage in academic research, focusing primarily on topics related to diffusion. At Wētā FX, Kurt plays a pivotal role as one of the researchers involved in the development of a facial motion solver using deep learning techniques called FDLS (Facial Deep Learning Solver). This solver has been utilized in various films, including Gemini Man (2019), She-Hulk (2022), Better Man (2024), and Kingdom of the Planet of the Apes (2024).