Standard

Toll-based reinforcement learning for efficient equilibria in route choice. / De Oliveira Ramos, Gabriel; Castro da Silva, Bruno; Radulescu, Roxana; Bazzan, Ana; Nowe, Ann.

In: The Knowledge Engineering Review, Vol. 35, e8, 05.03.2020.

Research output: Contribution to journalArticle

Harvard

APA

Vancouver

Author

De Oliveira Ramos, Gabriel ; Castro da Silva, Bruno ; Radulescu, Roxana ; Bazzan, Ana ; Nowe, Ann. / Toll-based reinforcement learning for efficient equilibria in route choice. In: The Knowledge Engineering Review. 2020 ; Vol. 35.

BibTeX

@article{f0b628d421184efea5b906ae2a63c80c,
title = "Toll-based reinforcement learning for efficient equilibria in route choice",
abstract = "The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1{\%}, on average.",
author = "{De Oliveira Ramos}, Gabriel and {Castro da Silva}, Bruno and Roxana Radulescu and Ana Bazzan and Ann Nowe",
year = "2020",
month = "3",
day = "5",
doi = "10.1017/S0269888920000119",
language = "English",
volume = "35",
journal = "Knowledge Engineering Review",
issn = "0269-8889",
publisher = "Cambridge University Press",

}

RIS

TY - JOUR

T1 - Toll-based reinforcement learning for efficient equilibria in route choice

AU - De Oliveira Ramos, Gabriel

AU - Castro da Silva, Bruno

AU - Radulescu, Roxana

AU - Bazzan, Ana

AU - Nowe, Ann

PY - 2020/3/5

Y1 - 2020/3/5

N2 - The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1%, on average.

AB - The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1%, on average.

U2 - 10.1017/S0269888920000119

DO - 10.1017/S0269888920000119

M3 - Article

VL - 35

JO - Knowledge Engineering Review

JF - Knowledge Engineering Review

SN - 0269-8889

M1 - e8

ER -

ID: 49799869