Sciweavers

81 search results - page 17 / 17
» The Optimal Reward Baseline for Gradient-Based Reinforcement...
Sort
View
LWA
2007
13 years 6 months ago
Towards Learning User-Adaptive State Models in a Conversational Recommender System
Typical conversational recommender systems support interactive strategies that are hard-coded in advance and followed rigidly during a recommendation session. In fact, Reinforceme...
Tariq Mahmood, Francesco Ricci