Hao Liang

Home Publications Talks Services Other

	Hao Liang is a postdoctoral researcher at University of Maryland, College Park, working with Kaiqing Zhang. He received his PhD from The Chinese University of Hong Kong (CUHK), Shenzhen, under the supervision of Zhi-Quan (Tom) Luo. Prior to joining UMD, he was a Research Associate at King’s College London. His research lies at the intersection of statistical and computational efficiency in decision-making algorithms, with a particular focus on risk awareness, safety, and multi-agent systems. Email: haoliang1 at link.cuhk.edu.cn [CV] [Google Scholar] [LinkedIn]

Hao Liang is a postdoctoral researcher at University of Maryland, College Park, working with Kaiqing Zhang. He received his PhD from The Chinese University of Hong Kong (CUHK), Shenzhen, under the supervision of Zhi-Quan (Tom) Luo. Prior to joining UMD, he was a Research Associate at King’s College London. His research lies at the intersection of statistical and computational efficiency in decision-making algorithms, with a particular focus on risk awareness, safety, and multi-agent systems.

Email: haoliang1 at link.cuhk.edu.cn
[CV] [Google Scholar] [LinkedIn]

News

May 2026: One paper accepted to the IEEE Transactions on Signal Processing (TSP): "Optimistic Thompson Sampling for No-Regret Learning in Unknown Games" [link]
May 2026: Recognized as an ICML 2026 Gold Reviewer
April 2026: One paper accepted to ICML 2026: "How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models?" [link]
January 2026: Two papers accepted to ICLR 2026: "Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation?" [link] and "BRIDGE: Bi-level Reinforcement Learning for Dynamic Group Structure in Coalition Formation Games"
November 2025: Served as Local Organizing Chair of the 7th International Conference on Distributed Artificial Intelligence (DAI 2025) held on November 21–24, 2025 in London, UK.
October 2025: Our paper, "Why GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive Gradients" will be presented at NeurIPS 2025 Workshop on Efficient Reasoning
👉 TL;DR: We reveal that GRPO’s normalization acts as an adaptive gradient mechanism aligned with local curvature, accelerating and stabilizing LLM reasoning training.
September 2025: Our paper, "Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems" is accepted by NeurIPS 2025 (Spotlight) [link]
👉 TL;DR: We develop GSAC, a causality-aware framework enabling provable scalability and fast cross-domain adaptation in large networked systems.
July 2024: My paper, "Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds" is accepted by JMLR
👉 TL;DR: We bridge distributional and risk-sensitive RL under entropic risk measures, achieving near-optimal regret with computationally efficient DRL algorithms.
July 2024: Present "Bridging Distributional and Risk-Sensitive Reinforcement Learning: Balancing Statistical, Computational, and Risk Considerations" at ICML 2024 FoRLaC Workshop
March 2024: Deliver a talk "Efficient Risk-aware Decision-making: A Distributional Perspective" at Vector Institute
March 2024: Deliver a talk "A Distribution Optimization Framework for Confidence Bounds of Risk Measures" at the Informs Optimization Society (IOS) Conference
Janurary 2024: My paper, "Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures" is accepted at AISTATS 2024
July 2023: Present "A Distribution Optimization Framework for Confidence Bounds of Risk Measures" at ICML 2023

Selected Papers

Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation? [link] ICLR 2026

Hao Liang, Jiayu Cheng, Sean R. Sinclair, Yali Du

Why GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive Gradients [link] NeurIPS 2025 Workshop on Efficient Reasoning

Cheng Ge*, Heqi Yin*, Hao Liang^†, Jiawei Zhang^†

* Equal contribution. † Co-last authors.

Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems [link] NeurIPS 2025 (Spotlight)

Hao Liang*, Shuqing Shi*, Yudi Zhang, Biwei Huang, Yali Du

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds [link] JMLR

Hao Liang, Zhi-Quan Luo

Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures [link] AISTATS 2024

Hao Liang, Zhi-Quan Luo

A Distribution Optimization Framework for Confidence Bounds of Risk Measures [link] ICML 2023

Hao Liang, Zhi-Quan Luo

Is Pure Exploitation Sufficient in Exogenous MDPs with Linear Function Approximation? [link]	ICLR 2026
Hao Liang, Jiayu Cheng, Sean R. Sinclair, Yali Du

Why GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive Gradients [link]	NeurIPS 2025 Workshop on Efficient Reasoning
Cheng Ge, Heqi Yin, Hao Liang^†, Jiawei Zhang^†
* Equal contribution. † Co-last authors.

Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems [link]	NeurIPS 2025 (Spotlight)
Hao Liang, Shuqing Shi, Yudi Zhang, Biwei Huang, Yali Du

Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds [link]	JMLR
Hao Liang, Zhi-Quan Luo

Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures [link]	AISTATS 2024
Hao Liang, Zhi-Quan Luo

A Distribution Optimization Framework for Confidence Bounds of Risk Measures [link]	ICML 2023
Hao Liang, Zhi-Quan Luo