Indexing the Pickup and Drop-Off Locations of NYC Taxi Trips in PostgreSQL - Lessons from the Road

Abstract

In this paper, we present our experience in indexing the dropoff and pick-up locations of taxi trips in New York City. The paper presents a comprehensive experimental analysis of classic and state-ofthe-art spatial database indexing schemes. The paper evaluates a popular spatial tree indexing scheme (i.e., GIST-Spatial), a Block Range Index (BRIN-Spatial) provided by PostgreSQL as well as a new indexing scheme, namely Hippo-Spatial. In the experiments, the paper considers five evaluation metrics to compare and contrast the performance of the three indexing schemes: storage overhead, index initialization time, query response time, maintenance overhead, and throughput. Furthermore, the benchmark takes into account parameters that affect the index performance, which include but is not limited to: data size, spatial query selectivity, and spatial area density, The paper finally analyzes the experimental evaluation results and highlights the key insights and lessons learned. The results emphasize the fact that there is no one size that fits all when it comes to indexing massive-scale spatial data. The results also prove that modern database systems can maintain a lightweight index (in terms of storage and maintenance overhead) that is also fast enough for spatial data analytics applications.

Publication
In International Symposium on Spatial and Temporal Databases, SSTD
Jia Yu
Jia Yu
Co-founder

Jia Yu is a co-founder of Wherobots Inc. and leads its engineering team. Jia is the creator of Apache Sedona and was a Tenure-Track Assistant Professor of Computer Science at Washington State University from 2020 to 2023. Jia’s research interests include database systems, distributed data systems and geospatial data management.

Mohamed Sarwat
Mohamed Sarwat
Assistant Professor

Mohamed Sarwat is an assistant professor of computer science at Arizona State University. His general research interest lies in developing robust and scalable data systems for spatial and spatiotemporal applications.

Related