Spatial Data Wrangling with GeoSpark - A Step by Step Tutorial


This tutorial is expected to deliver a comprehensive study and hands-on tutorial of how GeoSpark incorporates Spark to uphold massive-scale spatial data. We also want this tutorial to serve as an introductory course that teaches the audience the basic building blocks in a scalable spatial data management system and the important design concerns based on our previous experience. We begin our tutorial with a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third section gives a hands-on live demonstration to illustrate the basic steps of performing geospatial data analytics using GeoSpark.

In ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Jia Yu
Jia Yu
Assistant Professor (from Fall 2020)

Jia Yu obtained his PhD from Arizona State University in Summer 2020. His research interests include database systems, distributed data systems and geospatial data management.

Mohamed Sarwat
Mohamed Sarwat
Assistant Professor

Mohamed Sarwat is an assistant professor of computer science at Arizona State University. His general research interest lies in developing robust and scalable data systems for spatial and spatiotemporal applications.