Road network traffic data has been widely studied by researchers and practitioners in different areas such as urban planning, traffic prediction and spatial-temporal databases. For instance, researchers use such data to evaluate the impact of road network changes. Unfortunately, collecting large-scale highquality urban traffic data requires tremendous efforts because participating vehicles must install GPS receivers and administrators must continuously monitor these devices. There has been a number of urban traffic simulators trying to generate such data with different features. However, they suffer from two critical issues (1) scalability: most of them only offer singlemachine solution which is not adequate to produce large-scale data. Some simulators can generate traffic in parallel but do not well balance the load among machines in a cluster. (2) granularity: many simulators do not consider microscopic traffic situations including traffic lights, lane changing, car following. In the paper, we propose GeoSparkSim, a scalable traffic simulator which extends Apache Spark to generate large-scale road network traffic datasets with microscopic traffic simulation. The proposed system seamlessly integrates with a Spark-based spatial data management system, GeoSpark, to deliver a holistic approach that allows data scientists to simulate, analyze and visualize largescale urban traffic data. To implement microscopic traffic models, GeoSparkSim employs a simulation-aware vehicle partitioning method to partition vehicles among different machines such that each machine has a balanced workload. The experimental analysis shows that GeoSparkSim can simulate the movements of 200 thousand vehicles over a very large road network (250 thousand road junctions and 300 thousand road segments).
- Demonstrating GeoSparkSim: A Scalable Microscopic Road Network Traffic Simulator Based on Apache Spark
- Building Microscopic Road Network Traffic Simulators in Apache Spark
- Building a Large-Scale Microscopic Road Network Traffic Simulator in Apache Spark
- GeoSpark and Geospatial Data Management in Apache Spark
- GeoSparkViz in Action: A Data System with built-in support for Geospatial Visualization