Use of variance estimation in the multi-armed bandit problem thumbnail
slide-image
Pause
Mute
Subtitles not available
Playback speed
0.25
0.5
0.75
1
1.25
1.5
1.75
2
Full screen

Use of variance estimation in the multi-armed bandit problem

Published on Feb 25, 20074525 Views

An important aspect of most decision making problems concerns the appropriate balance between exploitation (acting optimally according to the partial knowledge acquired so far) and exploration of th

Related categories

Chapter list

Use of variance estimation in the multi-armed<br> bandit problem00:01
Outline00:20
The multi-armed bandit problem00:41
Notation (1/2)01:33
Notation (2/2)02:32
UCB policies05:45
UCB policies0106:27
Bernstein’s type inequalities07:51
Sketch of the proof09:39
Sketch of the proof0111:03
Definition11:33
Definition0112:10
Definition0212:28
A deviation inequality for the number of plays of<br> non-optimal arms13:37
A deviation inequality for the number of plays of non-optimal arms0114:03
A deviation inequality for the number of plays of <br>non-optimal arms0214:48
Cumulative regret bounds14:56
Cumulative regret bounds0118:16
Cumulative regret bounds0218:55
Discussion on the 1/n-UCB policy19:49
Discussion on the 1/n-UCB policy0120:21
Discussion on the 1/n-UCB policy0220:27
Definition21:15
Expected cumulative regret bound21:30
Expected cumulative regret bound0122:10
Sketch of the proof (1/3)22:47
Sketch of the proof (2/3)23:15
Sketch of the proof (2/3)23:37
Sketch of the proof (3/3)23:53
Conclusion24:12