Posts by Collection

portfolio

Tabula

Tabula is a middleware that runs on top of a SQL data system with the purpose of increasing the interactivity of geospatial visualization dashboards.

Publications: ICDE 2020 (research)

Collaborators: Mohamed Sarwat (Arizona State University)

Highlight: Tabula is implemented in Apache Spark SQL

ALEX

ALEX is a new class of learned indexes which addresses issues that arise when implementing dynamic and updatable learned indexes.

Publications: a research paper is in preparation

Collaborators:

MIT: Jialin Ding

Microsoft Research: Jae Young Do, David Lomet, Yinan Li, Chi Wang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann

ETH: Hantian Zhang

Hermit

Hermit is a succinct secondary indexing mechanism for modern RDBMSs. It judiciously leverages the rich soft functional dependencies hidden among columns to prune out redundant structures for indexed key access

Publications: SIGMOD 2019 (research), PVLDB 2019 (demo)

Collaborators: Yingjun Wu, Yuanyuan Tian, Ronald Barber, and Richard Sidle (IBM Almaden Research Center)

Hippo is a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance.

Publications: PVLDB 2016 (research), ICDE 2017 (demo), SSTD 2017 (research)

Collaborators: Mohamed Sarwat (Arizona State University)

Highlight: Hippo is a PostgreSQL 9.6 built-in index

GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL to efficiently load, process, and analyze large-scale spatial data across machines.

Publications:

Research paper: Geoinformatica Journal 2019, MDM 2019, SSDBM 2018

Demo and short paper: ICDE 2019, SSTD 2019, ICDE 2016, SIGSPATIAL 2015 (short)

Tutorial: ICDE 2019

Collaborators: Zongsi Zhang, Zishan Fu, Mohamed Sarwat (Arizona State University)

Highlight: GeoSpark has > 200K overall website visits and > 10K monthly downloads. Users and contributors include Facebook, Apple, Uber, MoBike, and numerous startups

publications

talks

teaching

Coursera course designer: Degree of Computer Science - Data Systems

graduate online course, Arizona State University, Computer Science, 2018

Database systems are used to provide convenient access to disk-resident data through efficient query processing, indexing structures, concurrency control, and recovery. This specialization delves into new frameworks for processing and generating large-scale datasets with parallel and distributed algorithms. Courses cover the design, deployment and use of state-of-the-art data processing systems, which provide scalable access to data.

Instructor: CSE 511 Data Processing at Scale

graduate course, Arizona State University, Computer Science, 2019

Database systems are used to provide convenient access to disk-resident data through efficient query processing, indexing structures, concurrency control, and recovery. This course delves into new frameworks for processing and generating large-scale datasets with parallel and distributed algorithms, covering the design, deployment and use of state-of-the-art data processing systems, which provide scalable access to data.