Online Markov Decision Processes under Bandit Feedback

Published on Mar 25, 20113122 Views

Gergely Neu

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete

Knowledge 4 All Foundation Video Journal Volume 1

Related categories

Markov Processes

Online Markov Decision Processes under Bandit Feedback

Gergely Neu

Knowledge 4 All Foundation Video Journal Volume 1

Related categories

VIDEOLECTURES

LEGAL