policy | Sciweavers

12

ICASSP
2011
IEEE

177views Signal Processing» more ICASSP 2011»

Logarithmic weak regret of non-Bayesian restless multi-armed bandit

12 years 8 months ago

Abstract—We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics. At each time, a player chooses K out of N (N > K) arms to play. The state of each ar...

Haoyang Liu, Keqin Liu, Qing Zhao

claim paper

Read More »

11

click to vote

CORR
2011
Springer

161views Education» more CORR 2011»

Doubly Robust Policy Evaluation and Learning

12 years 8 months ago

Download www.icml-2011.org

We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as...

Miroslav Dudík, John Langford, Lihong Li

claim paper

Read More »

11

click to vote

TSP
2010

170views Artificial Intelligence» more TSP 2010»

Distributed learning in multi-armed bandit with multiple players

12 years 11 months ago

Download www.ece.ucdavis.edu

We formulate and study a decentralized multi-armed bandit (MAB) problem. There are distributed players competing for independent arms. Each arm, when played, offers i.i.d. reward a...

Keqin Liu, Qing Zhao

claim paper

Read More »

13

click to vote

TISSEC
2010

142views more TISSEC 2010»

A logical specification and analysis for SELinux MLS policy

12 years 11 months ago

Download www.patrickmcdaniel.org

The SELinux mandatory access control (MAC) policy has recently added a multi-level security (MLS) model which is able to express a fine granularity of control over a subject'...

Boniface Hicks, Sandra Rueda, Luke St. Clair, Tren...

claim paper

Read More »

13

click to vote

JMLR
2010

189views more JMLR 2010»

Adaptive Step-size Policy Gradients with Average Reward Metric

12 years 11 months ago

Download jmlr.csail.mit.edu

In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...

Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...

claim paper

Read More »

9

click to vote

JMLR
2010

101views more JMLR 2010»

Efficient Reductions for Imitation Learning

12 years 11 months ago

Download www.cs.cmu.edu

Imitation Learning, while applied successfully on many large real-world problems, is typically addressed as a standard supervised learning problem, where it is assumed the trainin...

Stéphane Ross, Drew Bagnell

claim paper

Read More »

25

click to vote

EIS
2011

253views ECommerce» more EIS 2011»

A modelling and reasoning framework for social networks policies

12 years 11 months ago

Download www.governatori.net

Policy languages (such as privacy and rights) have had little impact on the wider community. Now that Social Networks have taken off, the need to revisit Policy languages and real...

Guido Governatori, Renato Iannella

claim paper

Read More »

15

click to vote

CORR
2010
Springer

143views Education» more CORR 2010»

The Non-Bayesian Restless Multi-Armed Bandit: a Case of Near-Logarithmic Regret

13 years 1 months ago

Download www.ece.ucdavis.edu

In the classic Bayesian restless multi-armed bandit (RMAB) problem, there are N arms, with rewards on all arms evolving at each time as Markov chains with known parameters. A play...

Wenhan Dai, Yi Gai, Bhaskar Krishnamachari, Qing Z...

claim paper

Read More »

12

click to vote

CJ
2010

134views more CJ 2010»

Designing Effective Policies for Minimal Agents

13 years 1 months ago

Download www.doc.ic.ac.uk

A policy for a minimal reactive agent is a set of condition-action rules used to determine its response to perceived environmental stimuli. When the policy pre-disposes the agent t...

Krysia Broda, Christopher J. Hogger

claim paper

Read More »

14

click to vote

ICTAC
2009
Springer

127views Applied Computing» more ICTAC 2009»

A First-Order Policy Language for History-Based Transaction Monitoring

13 years 2 months ago

Download users.cecs.anu.edu.au

Online trading invariably involves dealings between strangers, so it is important for one party to be able to judge objectively the trustworthiness of the other. In such a setting,...

Andreas Bauer 0002, Rajeev Goré, Alwen Tiu

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers