View a PDF of the paper titled Sublinear Regret for a Class of Continuous-Time Linear-Quadratic Reinforcement Learning Problems, by Yilie Huang and 2 other authors
View PDF
HTML (experimental)
Abstract:We study reinforcement learning (RL) for a class of continuous-time linear-quadratic (LQ) control problems for diffusions, where states are scalar-valued and running control rewards are absent but volatilities of the state processes depend on both state and control variables. We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an RL algorithm to learn the optimal policy parameter directly. Our main contributions include the introduction of an exploration schedule and a regret analysis of the proposed algorithm. We provide the convergence rate of the policy parameter to the optimal one, and prove that the algorithm achieves a regret bound of $O(N^{\frac{3}{4}})$ up to a logarithmic factor, where $N$ is the number of learning episodes. We conduct a simulation study to validate the theoretical results and demonstrate the effectiveness and reliability of the proposed algorithm. We also perform numerical comparisons between our method and those of the recent model-based stochastic LQ RL studies adapted to the state- and control-dependent volatility setting, demonstrating a better performance of the former in terms of regret bounds.
Submission history
From: Yilie Huang [view email]
[v1]
Wed, 24 Jul 2024 12:26:21 UTC (247 KB)
[v2]
Sat, 21 Sep 2024 16:48:58 UTC (263 KB)
[v3]
Tue, 18 Mar 2025 14:55:51 UTC (204 KB)
[v4]
Tue, 8 Apr 2025 19:11:31 UTC (205 KB)