Sciweavers

332 search results - page 42 / 67
» Ranking policies in discrete Markov decision processes
Sort
View
129
Voted
ICML
2001
IEEE
16 years 2 months ago
Off-Policy Temporal Difference Learning with Function Approximation
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Doina Precup, Richard S. Sutton, Sanjoy Dasgupta
134
Voted
CODES
2009
IEEE
15 years 5 months ago
An MDP-based application oriented optimal policy for wireless sensor networks
Technological advancements due to Moore’s law have led to the proliferation of complex wireless sensor network (WSN) domains. One commonality across all WSN domains is the need ...
Arslan Munir, Ann Gordon-Ross
126
Voted
INFOCOM
2012
IEEE
13 years 4 months ago
Approximately optimal adaptive learning in opportunistic spectrum access
—In this paper we develop an adaptive learning algorithm which is approximately optimal for an opportunistic spectrum access (OSA) problem with polynomial complexity. In this OSA...
Cem Tekin, Mingyan Liu
ATAL
2005
Springer
15 years 7 months ago
Exploiting belief bounds: practical POMDPs for personal assistant agents
Agents or agent teams deployed to assist humans often face the challenges of monitoring the state of key processes in their environment (including the state of their human users t...
Pradeep Varakantham, Rajiv T. Maheswaran, Milind T...
AAAI
2006
15 years 3 months ago
Point-based Dynamic Programming for DEC-POMDPs
We introduce point-based dynamic programming (DP) for decentralized partially observable Markov decision processes (DEC-POMDPs), a new discrete DP algorithm for planning strategie...
Daniel Szer, François Charpillet