A demonstration of GeoSpark: A cluster computing framework for processing big spatial data. In Proceedings of the Internation

April 2016

PDF Code Project website

Abstract

This paper demonstrates GEOSPARK a cluster computing framework for developing and processing large-scale spatial data analytics programs. GEOSPARK consists of three main layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Apache Spark functionalities as regular RDD operations. Spatial RDD Layer consists of three novel Spatial Resilient Distributed Datasets (SRDDs) which extend regular Apache Spark RDD to support geometrical and spatial objects with data partitioning and indexing. Spatial Query Processing Layer executes spatial queries (e.g., Spatial Join) on SRDDs. The dynamic status of SRDDs and spatial operations are visualized by GEOSPARK monitoring map interface. We demonstrate GEOSPARK using three spatial analytics applications (spatial aggregation, autocorrelation and co-location) to show how users can easily define their spatial analytics tasks and efficiently process such tasks on large-scale spatial data at interactive performance.

Publication

In IEEE International Conference on Data Engineering, ICDE

Jia Yu

Co-founder

Jia Yu is a co-founder of Wherobots Inc.. Jia is the creator of Apache Sedona and was a Tenure-Track Assistant Professor of Computer Science at Washington State University from 2020 to 2023. Jia’s research interests include database systems, distributed data systems and geospatial data management.

Mohamed Sarwat

Assistant Professor

Mohamed Sarwat is an assistant professor of computer science at Arizona State University. His general research interest lies in developing robust and scalable data systems for spatial and spatiotemporal applications.