Apache Sedona

Introduction

Apache Sedona (formerly GeoSpark) is a cluster computing system for processing large-scale spatial data. It extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines. Apache Sedona joins Apache Software foundation in July 2020.

Source code

I implemented Apache Sedona into Apache Spark and SparkSQL. Project website: sedona.apache.org

Reputation

  • Apache Sedona is the defacto spatial data processing framework on top of Apache Spark.

  • Apache Sedona has 300K monthly downloads.

  • Users and contributors include Facebook, Apple, Uber, MoBike, and numerous startups

  • Apache Sedona in production (video), from Gyana, a British Location Inteligence company

  • Apache Sedona received an evaluation from PVLDB 2018 paper How Good Are Modern Spatial Analytics Systems?, written by Varun Pandey, Andreas Kipf, Thomas Neumann, Alfons Kemper (Technical University of Munich), quoted as follows:

GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.

Selected publications

Apache Sedona is a full-fledged big geospatial data analytics system that provides

  • Data generation (GeoSparkSim, MDM 2019)
  • Data managemenet and query processing (GeoSpark, Geoinformatica 2019)
  • Visulization (GeoSparkViz, SSDBM 2018, an extended version is under revision by VLDB Journal)
Jia Yu
Jia Yu
Co-founder

Jia Yu is a co-founder of Wherobots Inc. and leads its engineering team. Jia is the creator of Apache Sedona and was a Tenure-Track Assistant Professor of Computer Science at Washington State University from 2020 to 2023. Jia’s research interests include database systems, distributed data systems and geospatial data management.