Time critic policy gradient methods for traffic signal control in complex and congested scenarios

Stefano Giovanni Rizzo, Giovanna Vantini, Sanjay Chawla

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Employing an optimal traffic light control policy has the potential of having a positive impact, both economic and environmental, on urban mobility. Reinforcement learning techniques have shown promising results in optimizing control policies for basic intersections and low volume traffic. This paper addresses the traffic light control problem in a complex scenario, such as a signalized roundabout with heavy traffic volumes, with the aim of maximizing throughput and avoiding traffic jams. We formulate the environment with a realistic representation of states and actions and a capacity-based reward. We enforce episode terminal conditions to avoid unwanted states, such as long queues interfering with other junctions in the vehicular network. A time-dependent baseline is proposed to reduce the variance of Policy Gradient updates in the setting of episodic conditions, thus improving the algorithm convergence to an optimal solution. We evaluate the method on real data and highly congested traffic, implementing a signalized simulated roundabout with 11 phases. The proposed method is able to avoid traffic jams and achieves higher performance than traditional time-splitting policies and standard Policy Gradient on average delay and effective capacity, while drastically decreasing the emissions.

Original languageEnglish
Title of host publicationKDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1654-1664
Number of pages11
ISBN (Electronic)9781450362016
DOIs
Publication statusPublished - 25 Jul 2019
Event25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019 - Anchorage, United States
Duration: 4 Aug 20198 Aug 2019

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019
CountryUnited States
CityAnchorage
Period4/8/198/8/19

Fingerprint

Traffic signals
Gradient methods
Telecommunication traffic
Reinforcement learning
Throughput
Economics

Keywords

  • Policy gradient
  • Reinforcement learning
  • Roundabout modeling
  • Traffic light control

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Rizzo, S. G., Vantini, G., & Chawla, S. (2019). Time critic policy gradient methods for traffic signal control in complex and congested scenarios. In KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1654-1664). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Association for Computing Machinery. https://doi.org/10.1145/3292500.3330988

Time critic policy gradient methods for traffic signal control in complex and congested scenarios. / Rizzo, Stefano Giovanni; Vantini, Giovanna; Chawla, Sanjay.

KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2019. p. 1654-1664 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rizzo, SG, Vantini, G & Chawla, S 2019, Time critic policy gradient methods for traffic signal control in complex and congested scenarios. in KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, pp. 1654-1664, 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019, Anchorage, United States, 4/8/19. https://doi.org/10.1145/3292500.3330988
Rizzo SG, Vantini G, Chawla S. Time critic policy gradient methods for traffic signal control in complex and congested scenarios. In KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2019. p. 1654-1664. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/3292500.3330988
Rizzo, Stefano Giovanni ; Vantini, Giovanna ; Chawla, Sanjay. / Time critic policy gradient methods for traffic signal control in complex and congested scenarios. KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2019. pp. 1654-1664 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{d7fe079762eb4b2fa343297a90b9cc78,
title = "Time critic policy gradient methods for traffic signal control in complex and congested scenarios",
abstract = "Employing an optimal traffic light control policy has the potential of having a positive impact, both economic and environmental, on urban mobility. Reinforcement learning techniques have shown promising results in optimizing control policies for basic intersections and low volume traffic. This paper addresses the traffic light control problem in a complex scenario, such as a signalized roundabout with heavy traffic volumes, with the aim of maximizing throughput and avoiding traffic jams. We formulate the environment with a realistic representation of states and actions and a capacity-based reward. We enforce episode terminal conditions to avoid unwanted states, such as long queues interfering with other junctions in the vehicular network. A time-dependent baseline is proposed to reduce the variance of Policy Gradient updates in the setting of episodic conditions, thus improving the algorithm convergence to an optimal solution. We evaluate the method on real data and highly congested traffic, implementing a signalized simulated roundabout with 11 phases. The proposed method is able to avoid traffic jams and achieves higher performance than traditional time-splitting policies and standard Policy Gradient on average delay and effective capacity, while drastically decreasing the emissions.",
keywords = "Policy gradient, Reinforcement learning, Roundabout modeling, Traffic light control",
author = "Rizzo, {Stefano Giovanni} and Giovanna Vantini and Sanjay Chawla",
year = "2019",
month = "7",
day = "25",
doi = "10.1145/3292500.3330988",
language = "English",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",
pages = "1654--1664",
booktitle = "KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Time critic policy gradient methods for traffic signal control in complex and congested scenarios

AU - Rizzo, Stefano Giovanni

AU - Vantini, Giovanna

AU - Chawla, Sanjay

PY - 2019/7/25

Y1 - 2019/7/25

N2 - Employing an optimal traffic light control policy has the potential of having a positive impact, both economic and environmental, on urban mobility. Reinforcement learning techniques have shown promising results in optimizing control policies for basic intersections and low volume traffic. This paper addresses the traffic light control problem in a complex scenario, such as a signalized roundabout with heavy traffic volumes, with the aim of maximizing throughput and avoiding traffic jams. We formulate the environment with a realistic representation of states and actions and a capacity-based reward. We enforce episode terminal conditions to avoid unwanted states, such as long queues interfering with other junctions in the vehicular network. A time-dependent baseline is proposed to reduce the variance of Policy Gradient updates in the setting of episodic conditions, thus improving the algorithm convergence to an optimal solution. We evaluate the method on real data and highly congested traffic, implementing a signalized simulated roundabout with 11 phases. The proposed method is able to avoid traffic jams and achieves higher performance than traditional time-splitting policies and standard Policy Gradient on average delay and effective capacity, while drastically decreasing the emissions.

AB - Employing an optimal traffic light control policy has the potential of having a positive impact, both economic and environmental, on urban mobility. Reinforcement learning techniques have shown promising results in optimizing control policies for basic intersections and low volume traffic. This paper addresses the traffic light control problem in a complex scenario, such as a signalized roundabout with heavy traffic volumes, with the aim of maximizing throughput and avoiding traffic jams. We formulate the environment with a realistic representation of states and actions and a capacity-based reward. We enforce episode terminal conditions to avoid unwanted states, such as long queues interfering with other junctions in the vehicular network. A time-dependent baseline is proposed to reduce the variance of Policy Gradient updates in the setting of episodic conditions, thus improving the algorithm convergence to an optimal solution. We evaluate the method on real data and highly congested traffic, implementing a signalized simulated roundabout with 11 phases. The proposed method is able to avoid traffic jams and achieves higher performance than traditional time-splitting policies and standard Policy Gradient on average delay and effective capacity, while drastically decreasing the emissions.

KW - Policy gradient

KW - Reinforcement learning

KW - Roundabout modeling

KW - Traffic light control

UR - http://www.scopus.com/inward/record.url?scp=85071199280&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071199280&partnerID=8YFLogxK

U2 - 10.1145/3292500.3330988

DO - 10.1145/3292500.3330988

M3 - Conference contribution

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 1654

EP - 1664

BT - KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -