Apache Sedona

Introduction

Apache Sedona is a cluster computing system for processing large-scale spatial data. It extends Apache Spark / Apache Flink / Snowflake with a set of out-of-the-box Spatial distirbuted datasets that efficiently load, process, and analyze large-scale spatial data across machines. Apache Sedona joins Apache Software foundation in July 2020.

Source code

Project website: sedona.apache.org

Reputation

  • Apache Sedona has 2 millions monthly downloads.

GeoSpark were evaluated by papers published on database top venues. It is worth noting that we do not have any collaboration with the authors.

SIGMOD 2020 paper “Architecting a Query Compiler for Spatial Workloads” Ruby Y. Tahboub, Tiark Rompf (Purdue University). In Figure 16a, GeoSpark distance join query runs around 7x - 9x faster than Simba, a spatial extension on Spark, on 1 - 24 core machines.

PVLDB 2018 paper “How Good Are Modern Spatial Analytics Systems?” Varun Pandey, Andreas Kipf, Thomas Neumann, Alfons Kemper (Technical University of Munich), quoted as follows: GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.

Selected publications

Apache Sedona is a full-fledged big geospatial data analytics system that provides

  • Data generation (GeoSparkSim, MDM 2019)
  • Data managemenet and query processing (GeoSpark, Geoinformatica 2019)
  • Visulization (GeoSparkViz, SSDBM 2018, an extended version is under revision by VLDB Journal)