Big Data

Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems

Classic database indexes (e.g., B+-Tree), though speed up queries, suffer from two main drawbacks: (1) An index usually yields 5% to 15% additional storage overhead which results in non-ignorable dollar cost in big data scenarios especially when …

Hippo

Hippo is a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance.

A demonstration of GeoSpark: A cluster computing framework for processing big spatial data. In Proceedings of the Internation

This paper demonstrates GEOSPARK a cluster computing framework for developing and processing large-scale spatial data analytics programs. GEOSPARK consists of three main layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing …

GeoSpark - A Cluster Computing Framework for Processing Large-Scale Spatial Data

GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data

This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of three layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides …

Apache Sedona

Apache Sedona is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL to efficiently load, process, and analyze large-scale spatial data across machines.