Sciweavers

38 search results - page 4 / 8
» On the Convergence of Optimistic Policy Iteration
Sort
View
ICML
2010
IEEE
15 years 2 months ago
Convergence of Least Squares Temporal Difference Methods Under General Conditions
We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least square...
Huizhen Yu
CORR
2008
Springer
115views Education» more  CORR 2008»
15 years 1 months ago
Adaptive Sum Power Iterative Waterfilling for MIMO Cognitive Radio Channels
Abstract--In this paper, the sum capacity of the Gaussian Multiple Input Multiple Output (MIMO) Cognitive Radio Channel (MCC) is expressed as a convex problem with finite number of...
Rajiv Soundararajan, Sriram Vishwanath
MICRO
2006
IEEE
73views Hardware» more  MICRO 2006»
15 years 7 months ago
Merging Head and Tail Duplication for Convergent Hyperblock Formation
VLIW and EDGE (Explicit Data Graph Execution) architectures rely on compilers to form high-quality hyperblocks for good performance. These compilers typically perform hyperblock f...
Bertrand A. Maher, Aaron Smith, Doug Burger, Kathr...
GLOBECOM
2009
IEEE
15 years 5 months ago
Stochastic Resource Allocation over Fading Multiple Access and Broadcast Channels
In this paper, we consider the optimal rate and power allocation that maximizes a general utility function of average user rates in a fading multiple-access or broadcast channel. B...
Na Gao, Xin Wang
AI
2002
Springer
15 years 1 months ago
Multiagent learning using a variable learning rate
Learning to act in a multiagent environment is a difficult problem since the normal definition of an optimal policy no longer applies. The optimal policy at any moment depends on ...
Michael H. Bowling, Manuela M. Veloso