Research Highlight: Zhiyuan Li

Professor Zhiyuan Li joined TTIC’s faculty as an Assistant Professor in the fall of 2023. Professor Li’s main research interests are in the theoretical foundation of machine learning, especially deep learning theory. He is currently focusing on topics including non-convex optimization of neural networks, generalization of overparameterized models, implicit bias of optimization algorithms, and large language models.

In April 2024, Professor Li and Professor Sanjeev Arora (Princeton University) were jointly named recipients of a Superalignment Fast Grant from Open AI to further investigate the “weak-to-strong generalization” problem. This process was highly selective, with only 50 of the 2,700 applicants receiving funding.

A fundamental challenge for aligning future superhuman AI systems (superalignment) lies in the fact that humans will need to supervise AI systems that surpass their own abilities. One of the most widely used alignment techniques is reinforcement learning from human feedback, which relies on humans to supervise models, such as ranking different responses generated from language models on the same prompt, according to Professor Li. However, this method may become outdated as human supervision could be less useful to superhuman models.

“Traditional machine learning focuses on the setting where weak models are supervised using labels generated by strong supervisors, like humans,” Professor Li said. “However, AI systems moving forward will surpass the intelligence levels of the humans who are supervising it, meaning that humans have become ‘weak supervisors.’ The question we are trying to answer is how can humans steer and trust AI systems that are more intelligent than them?”

Although Superintelligence (AI that is vastly smarter than humans) seems far off now, it’s possible that such technology could be developed within the next decade. It is important to know how to steer and control superhuman AI systems. The future of AI systems will be capable of complex behaviors that will make it hard for humans to reliably supervise them, according to Professor Li.

“OpenAI’s research empirically demonstrates that weak models can be used to supervise large models, for example, supervising GPT-4 with a GPT-2-level model on [natural language processing] (NLP) tasks, and the resulting model typically performs better than the weak supervisor.” Professor Li said. “However, they also found naive finetuning on weak supervision is not enough. We are researching the theoretical foundation of such weak-to-strong generalization phenomena and aim to design novel algorithms allowing better alignment or generalization. ”

Professor Li received his Ph.D. in computer science from Princeton University in 2022, and served as a postdoctoral fellow in the Computer Science Department at Stanford University from 2022 to 2023 before joining TTIC’s faculty. He has served as Area Chair for the Conference on Neural Information Processing Systems (NeurIPS) and is a recipient of a Microsoft Research Ph.D. Fellowship.