Deep reinforcement learning has recently provided promising results on the traffic light control optimization problem, by training neural network agents to select the traffic light phase. These agents learn complex models by optimizing a simple objective, such as the average traffic speed, but are considered opaque when it comes to explaining their decisions. Nevertheless, explanations are required in transferring this technology in the real world, especially in complex scenarios with nontrivial phases, such as in the case of signalized roundabouts with entry and circulatory traffic lights. In this paper, after training a Policy Gradient agent on a signalized roundabout with 11 phases and real traffic data, we analyze the relation between the agent phase preferences and the actual traffic, and we assess the agent capability of reacting to the current detectors state. Then, we estimate the effect of the road detectors state on the agent selected phases, through the SHAP model-agnostic technique, using Shapley values recovered from a linear explanation model. The results show that it is possible to extract meaningful explanations on the decision taken by a complex policy, in relation to both the traffic volumes and the lanes occupancy.