Projects

I focus on crafting data management systems to accelerate analytics over large-scale geospatial data and regular data. In the past, I worked on two main topics: distributed geospatial data management (query processing and visualization) and database indexing (geospatial data and regular data). More specifically, I led or participated in the following projects:

Note: You can click their titles below to learn more

Tabula

Tabula is a middleware that runs on top of a SQL data system with the purpose of increasing the interactivity of geospatial visualization dashboards.

Publications: ICDE 2020 (research)

Collaborators: Mohamed Sarwat (Arizona State University)

Highlight: Tabula is implemented in Apache Spark SQL

_________________________________

ALEX

ALEX is a new class of learned indexes which addresses issues that arise when implementing dynamic and updatable learned indexes.

Publications: a research paper is in preparation

Collaborators:

MIT: Jialin Ding

Microsoft Research: Umar Farooq Minhas, David Lomet, Jae Young Do, Yinan Li, Chi Wang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann

ETH: Hantian Zhang

_________________________________

Hermit

Hermit is a succinct secondary indexing mechanism for modern RDBMSs. It judiciously leverages the rich soft functional dependencies hidden among columns to prune out redundant structures for indexed key access

Publications: SIGMOD 2019 (research), PVLDB 2019 (demo)

Collaborators: Yingjun Wu, Yuanyuan Tian, Ronald Barber, and Richard Sidle (IBM Almaden Research Center)

_________________________________

Hippo is a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance.

Publications: PVLDB 2016 (research), ICDE 2017 (demo), SSTD 2017 (research)

Collaborators: Mohamed Sarwat (Arizona State University)

Highlight: Hippo is a PostgreSQL 9.6 built-in index

_________________________________

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL to efficiently load, process, and analyze large-scale spatial data across machines.

Publications:

Research paper: Geoinformatica Journal 2019, MDM 2019, SSDBM 2018

Demo and short paper: ICDE 2019, SSTD 2019, ICDE 2016, SIGSPATIAL 2015 (short)

Tutorial: ICDE 2019

Collaborators: Zongsi Zhang, Zishan Fu, Mohamed Sarwat (Arizona State University)

Highlight: GeoSpark has > 200K overall website visits and > 10K monthly downloads. Users and contributors include Facebook, Apple, Uber, MoBike, and numerous startups

_________________________________