GeoSpark

Apr 1, 2015

PDF Code Slides Follow

Introduction

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.

Source code

I implemented GeoSpark into Apache Spark and SparkSQL. Source code is hosted on Github: Source code, Project website

Reputation

GeoSpark is the defacto spatial data processing framework on top of Apache Spark.
GeoSpark had been recognized by Apache Spark Official Third Party Projects List since Sept.2016. The link was removed in Aug. 2018 due to the conflict with Spark trademark (see this commit)
GeoSpark has > 200K overall website visits and > 10K monthly downloads.
Users and contributors include Facebook, Apple, Uber, MoBike, and numerous startups
GeoSpark in production (video), from Gyana, a British Location Inteligence company
GeoSpark received an evaluation from PVLDB 2018 paper How Good Are Modern Spatial Analytics Systems?, written by Varun Pandey, Andreas Kipf, Thomas Neumann, Alfons Kemper (Technical University of Munich), quoted as follows:

GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.

Selected publications

GeoSpark is a full-fledged big geospatial data analytics system that provides

Data generation (GeoSparkSim, MDM 2019)
Data managemenet and query processing (GeoSpark, Geoinformatica 2019)
Visulization (GeoSparkViz, SSDBM 2018, an extended version is under revision by VLDB Journal)

Jia Yu

PhD Candidate

Jia Yu is a PhD candidate at Arizona State University. He will be an assistant professor in Computer Science at Washington State University School of Electrical Engineering and Computer Science from Fall 2020. His research interests include database systems, distributed data systems and geospatial data management.

Publications

Big Geospatial Data Processing Made Easy: A Working Guide to GeoSpark

Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software. Click the Slides button above to demo Academic’s Markdown slides feature.

Jia Yu, Mohamed Sarwat

Code Project Project website

Systems and Methods for an End-To-End Visual Analytics System for Massive-Scale Geospatial Data

Jia Yu, Zongsi Zhang, Mohamed Sarwat

Code Project Google Patents

Spatial Data Wrangling with GeoSpark - A Step by Step Tutorial

This tutorial is expected to deliver a comprehensive study and hands-on tutorial of how GeoSpark incorporates Spark to uphold …

Jia Yu, Mohamed Sarwat

PDF Code Project Project website

Geospatial Data Management in Apache Spark: A Tutorial

This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of …

Jia Yu, Mohamed Sarwat

PDF Code Project Slides Tutorial website

Spatial Data Management in Apache Spark: the GeoSpark Perspective and Beyond

The paper presents the details of designing and developing GEOSPARK, which extends the core engine of Apache Spark and SparkSQL to …

Jia Yu, Zongsi Zhang, Mohamed Sarwat

PDF Code Project Project website

A demonstration of GeoSpark: A cluster computing framework for processing big spatial data. In Proceedings of the Internation

This paper demonstrates GEOSPARK a cluster computing framework for developing and processing large-scale spatial data analytics …

Jia Yu, Jinxuan Wu, Mohamed Sarwat

PDF Code Project Project website

GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data

This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of …

Jia Yu, Jinxuan Wu, Mohamed Sarwat

PDF Code Project Project website

GeoSpark

Introduction

Source code

Reputation

Selected publications

PhD Candidate

Publications

Talks