Hippo is a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and inspects the remaining pages.

Source code

I implemented Hippo index into PostgreSQL kernel. Source code is hosted on Github:

Demo video

I implemented a demo system using Hippo-spatial as the backend. Demo video is hosted on Youtube:


I presented Hippo index in VLDB 2017. Here is the pre-presentation video:


I published 3 papers under this project.

  • Indexing the Pick-up and Drop-off Locations of NYC Taxi Trips in PostgreSQL – Lessons from the Road (Research paper)
    • Jia Yu, Mohamed Sarwat. In Proceedings of the International Symposium on Spatial and Temporal Databases, SSTD 2017, Washington D.C., USA August 2017
  • Hippo in Action: Scalable Indexing of a Billion New York City Taxi Trips and Beyond (DEMO paper)
    • Jia Yu, Raha Moraffah, Mohamed Sarwat. In Proceedings of the IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 2017
  • Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems (Research paper)
    • Jia Yu, Mohamed Sarwat. In Proceedings of the 43rd International Conference on Very Large Data Bases, VLDB 2017, Munich, Germany, August 2017