A decision process in which rewards depend on history rather than merely on the current state is called a decision process with non-Markovian rewards (NMRDP). In decisiontheoretic...
High-level stochastic description methods such as stochastic Petri nets, stochastic UML statecharts etc., together with specifications of performance variables (PVs), enable a co...
Coordinating multiple agents that need to perform a sequence of actions to maximize a system level reward requires solving two distinct credit assignment problems. First, credit m...
Contemporary lifestyle has become increasingly sedentary: little physical (sports, exercises) and much sedentary (TV, computers) activity. The nature of sedentary activity is self...
Shlomo Berkovsky, Mac Coombe, Jill Freyne, Dipak B...
As robots become a mass consumer product, they will need to learn new skills by interacting with typical human users. Past approaches have adapted reinforcement learning (RL) to a...