Most formulations of Reinforcement Learning depend on a single reinforcement reward value to guide the search for the optimal policy solution. If observation of this reward is rar...
Boolean linear programs (BLPs) are ubiquitous in AI. Satisfiability testing, planning with resource constraints, and winner determination in combinatorial auctions are all example...
Dale Schuurmans, Finnegan Southey, Robert C. Holte
Our research focuses on web information management for people who want to monitor and use the World Wide Web (WWW) information, as their information resource. Web information is m...
The problem of deriving joint policies for a group of agents that maximize some joint reward function can be modeled as a decentralized partially observable Markov decision proces...
Ranjit Nair, Milind Tambe, Makoto Yokoo, David V. ...
The 2-class transduction problem, as formulated by Vapnik [1], involves finding a separating hyperplane for a labelled data set that is also maximally distant from a given set of...