Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning

9 years 6 months ago

Download cs.uwaterloo.ca

Partially observable Markov decision processes (POMDPs) provide a natural framework to design applications that continuously make decisions based on noisy sensor measurements. The recent proliferation of smart phones and other wearable devices leads to new applications where, unfortunately, energy efﬁciency becomes an issue. To circumvent energy requirements, ﬁnite-state controllers can be applied because they are computationally inexpensive to execute. Additionally, when multi-agent POMDPs (e.g. Dec-POMDPs or I-POMDPs) are taken into account, ﬁnite-state controllers become one of the most important policy representations. Online methods scale the best; however, they are energy demanding. Thus methods to optimize ﬁnite-state controllers are necessary. In this paper, we present a new, efﬁcient approach to bounded policy interaction (BPI). BPI keeps the size of the controller small which is a desirable property for applications, especially on small devices. However, ﬁnding a...

Marek Grzes, Pascal Poupart

Real-time Traffic

ATAL 2015 | Intelligent Agents |

claim paper

Post Info
More Details (n/a)

Added	16 Apr 2016
Updated	16 Apr 2016
Type	Journal
Year	2015
Where	ATAL
Authors	Marek Grzes, Pascal Poupart

Comments (0)

Sciweavers

Incremental Policy Iteration with Guaranteed Escape from Local Optima in POMDP Planning

ATAL 2015 | Intelligent Agents |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers