We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is linear in the ...
The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric mixed membership model--each data point is modeled with a collection of components of different proportions. T...
Sinead Williamson, Chong Wang, Katherine A. Heller...
We present new algorithms for inverse optimal control (or inverse reinforcement learning, IRL) within the framework of linearlysolvable MDPs (LMDPs). Unlike most prior IRL algorit...