Spatial Data Wrangling with GeoSpark - A Step by Step Tutorial

November 2019

PDF Code Project website

Abstract

This tutorial is expected to deliver a comprehensive study and hands-on tutorial of how GeoSpark incorporates Spark to uphold massive-scale spatial data. We also want this tutorial to serve as an introductory course that teaches the audience the basic building blocks in a scalable spatial data management system and the important design concerns based on our previous experience. We begin our tutorial with a background introduction of the characteristics of spatial data and the history of distributed data management systems. A follow-up section presents common approaches used by the practitioners to extend Spark and introduces the vital components in a generic spatial data management system. The third section gives a hands-on live demonstration to illustrate the basic steps of performing geospatial data analytics using GeoSpark.

Type

Tutorial

Publication

In ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

Jia Yu

Co-founder

Jia Yu is a co-founder of Wherobots Inc.. Jia is the creator of Apache Sedona and was a Tenure-Track Assistant Professor of Computer Science at Washington State University from 2020 to 2023. Jia’s research interests include database systems, distributed data systems and geospatial data management.

Mohamed Sarwat

Assistant Professor

Mohamed Sarwat is an assistant professor of computer science at Arizona State University. His general research interest lies in developing robust and scalable data systems for spatial and spatiotemporal applications.