Standard

Interactive Thompson Sampling for Multi-Objective Multi-Armed Bandits. / Roijers, Diederik; Zintgraf, Luisa; Nowe, Ann.

Algorithmic Decision Theory - 5th International Conference, ADT 2017, Proceedings: 5th International Conference, ADT 2017, Luxembourg, Luxembourg, October 25–27, 2017, Proceedings. ed. / Jörg Rothe. Springer, 2017. p. 18-34 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10576 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference paper

Harvard

Roijers, D, Zintgraf, L & Nowe, A 2017, Interactive Thompson Sampling for Multi-Objective Multi-Armed Bandits. in J Rothe (ed.), Algorithmic Decision Theory - 5th International Conference, ADT 2017, Proceedings: 5th International Conference, ADT 2017, Luxembourg, Luxembourg, October 25–27, 2017, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10576 LNAI, Springer, pp. 18-34, International Conference on Algorithmic Decision Theory, Luxembourg, Luxembourg, 25/10/17. https://doi.org/10.1007/978-3-319-67504-6_2

APA

Roijers, D., Zintgraf, L., & Nowe, A. (2017). Interactive Thompson Sampling for Multi-Objective Multi-Armed Bandits. In J. Rothe (Ed.), Algorithmic Decision Theory - 5th International Conference, ADT 2017, Proceedings: 5th International Conference, ADT 2017, Luxembourg, Luxembourg, October 25–27, 2017, Proceedings (pp. 18-34). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10576 LNAI). Springer. https://doi.org/10.1007/978-3-319-67504-6_2

Vancouver

Roijers D, Zintgraf L, Nowe A. Interactive Thompson Sampling for Multi-Objective Multi-Armed Bandits. In Rothe J, editor, Algorithmic Decision Theory - 5th International Conference, ADT 2017, Proceedings: 5th International Conference, ADT 2017, Luxembourg, Luxembourg, October 25–27, 2017, Proceedings. Springer. 2017. p. 18-34. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-67504-6_2

Author

Roijers, Diederik ; Zintgraf, Luisa ; Nowe, Ann. / Interactive Thompson Sampling for Multi-Objective Multi-Armed Bandits. Algorithmic Decision Theory - 5th International Conference, ADT 2017, Proceedings: 5th International Conference, ADT 2017, Luxembourg, Luxembourg, October 25–27, 2017, Proceedings. editor / Jörg Rothe. Springer, 2017. pp. 18-34 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

BibTeX

@inproceedings{889f214fdf54434ab5f179d8a329f98a,
title = "Interactive Thompson Sampling for Multi-Objective Multi-Armed Bandits",
abstract = "In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solution sets for unknown utility functions of users, based on the stochastic reward vectors only. In online MORL on the other hand, the agent will often be able to elicit preferences from the user, enabling it to learn about the utility function of its user directly. In this paper, we study online MORL with user interaction employing the multi-objective multi-armed bandit (MOMAB) setting — perhaps the most fundamental MORL setting. We use Bayesian learning algorithms to learn about the environment and the user simultaneously. Specifically, we propose two algorithms: Utility-MAP UCB (umap-UCB) and Interactive Thompson Sampling (ITS), and show empirically that the performance of these algorithms in terms of regret closely approximates the regret of UCB and regular Thompson sampling provided with the ground truth utility function of the user from the start, and that ITS outperforms umap-UCB.",
author = "Diederik Roijers and Luisa Zintgraf and Ann Nowe",
year = "2017",
month = "10",
day = "25",
doi = "10.1007/978-3-319-67504-6_2",
language = "English",
isbn = "978-3-319-67503-9",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "18--34",
editor = "{ Rothe}, J{\"o}rg",
booktitle = "Algorithmic Decision Theory - 5th International Conference, ADT 2017, Proceedings",

}

RIS

TY - GEN

T1 - Interactive Thompson Sampling for Multi-Objective Multi-Armed Bandits

AU - Roijers, Diederik

AU - Zintgraf, Luisa

AU - Nowe, Ann

PY - 2017/10/25

Y1 - 2017/10/25

N2 - In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solution sets for unknown utility functions of users, based on the stochastic reward vectors only. In online MORL on the other hand, the agent will often be able to elicit preferences from the user, enabling it to learn about the utility function of its user directly. In this paper, we study online MORL with user interaction employing the multi-objective multi-armed bandit (MOMAB) setting — perhaps the most fundamental MORL setting. We use Bayesian learning algorithms to learn about the environment and the user simultaneously. Specifically, we propose two algorithms: Utility-MAP UCB (umap-UCB) and Interactive Thompson Sampling (ITS), and show empirically that the performance of these algorithms in terms of regret closely approximates the regret of UCB and regular Thompson sampling provided with the ground truth utility function of the user from the start, and that ITS outperforms umap-UCB.

AB - In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solution sets for unknown utility functions of users, based on the stochastic reward vectors only. In online MORL on the other hand, the agent will often be able to elicit preferences from the user, enabling it to learn about the utility function of its user directly. In this paper, we study online MORL with user interaction employing the multi-objective multi-armed bandit (MOMAB) setting — perhaps the most fundamental MORL setting. We use Bayesian learning algorithms to learn about the environment and the user simultaneously. Specifically, we propose two algorithms: Utility-MAP UCB (umap-UCB) and Interactive Thompson Sampling (ITS), and show empirically that the performance of these algorithms in terms of regret closely approximates the regret of UCB and regular Thompson sampling provided with the ground truth utility function of the user from the start, and that ITS outperforms umap-UCB.

UR - http://www.scopus.com/inward/record.url?scp=85032488195&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-67504-6_2

DO - 10.1007/978-3-319-67504-6_2

M3 - Conference paper

SN - 978-3-319-67503-9

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 18

EP - 34

BT - Algorithmic Decision Theory - 5th International Conference, ADT 2017, Proceedings

A2 - Rothe, Jörg

PB - Springer

ER -

ID: 36362579