Jia Yu

Co-founder

Wherobots Inc.

Biography

Jia Yu is a co-founder of Wherobots, a Spatial Intelligence Cloud platform for spatial data ETL, analytics, and AI. Previously, he was a Tenure-Track Assistant Professor of Computer Science at Washington State University (2020–2023) and earned his Ph.D. from Arizona State University. Jia specializes in large-scale database systems and geospatial data management, with a focus on distributed systems, indexing, and visualization. His work has been featured in top conferences and journals such as SIGMOD, VLDB, ICDE, SIGSPATIAL, and VLDB Journal. He is the primary contributor to Apache Sedona, an open-source big spatial data framework with over 2 million monthly downloads and widespread industry adoption.

News

06/2026: Our paper RayBooster, the first system to bring GPU ray-tracing cores into a production geospatial database, has been accepted to the VLDB 2026 Industry Track, delivering up to 5.93x faster spatial joins on Apache SedonaDB at 59% lower cost.
🏆 10 Year Impact Award: Our foundational paper on Apache Sedona (GeoSpark), published at ACM SIGSPATIAL 2015, received the 10 Year Impact Award at ACM SIGSPATIAL 2025. [ Impact statement][ Award certificate]
05/2025: Invited to be a Program Committee member of IEEE ICDE 2026, ACM SIGSPATIAL 2025
11/2024: Wherobots raised $21.5M in a Series A round led by Felicis, with continued support from Wing Venture Capital and Clear Ventures, along with participation from JetBlue Ventures and Prosperity7 Ventures. This brings our total funding to $27M.
1/2024: Invited to be a Program Committee member of ACM SIGMOD 2025, ACM SIGSPATIAL 2024, IEEE MDM 2024

Interests

Database systems
Distributed data systems
Geospatial data management

Education

Ph.D. in Computer Science, 2020

Arizona State University
B.E. in Software Eng., Outstanding Graduate, 2013

Northwest Agriculture and Forestry University, China (西北农林科技大学, Project 985 & 211)

Experience

Co-founder

Wherobots Inc.

Sep 2022 – Present Seattle, Washington

Assistant Professor

Washington State University, School of Electrical Engineering and Computer Science

Aug 2020 – Aug 2023 Pullman, Washington

Full-time: August 2020 - August 2022
Leave of Absence: August 2022 - August 2023

Research Intern

Microsoft Research, Database group

Jun 2019 – Aug 2019 Redmond, Washington

– Microsoft is the birthplace of Micrsoft SQL Server
– Mentor / Collaborators: Umar Farooq Minhas, David Lomet, Jaeyoung Do, Yinan Li, Chi Wang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann
– I worked on a realistic design of updatable learned indices
– SIGMOD 2020 research paper ALEX: An Updatable Adaptive Learned Index

Research Intern

IBM Almaden Research Center, Database group

May 2018 – Aug 2018 San Jose, California

– IBM-Almaden is the birthplace of relational model, SQL and DB2 DBMS
– Mentor / Collaborators: Vijayshankar Raman, Yingjun Wu, Yuanyuan Tian, Ronald Barber, Richard Sidle
– I participated in Hermit project to design a succinct secondary index. I also explored the code generation issues on compressed database tables and implemented a preliminary code generator with JIT execution using LLVM, for IBM HTAP system
– SIGMOD 2019 research paper Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations
– VLDB 2019 demo paper HERMIT in action: Succinct secondary indexing mechanism via correlation exploration

Software Development Intern

Apple, Maps team

Aug 2016 – Jun 2016 Cupertino, California

– Apple is the birthplace of Apple Maps
– Mentor: Huang-Hsiang Cheng; Manager: Alex Radeski
– I deployed and improved distributed computing frameworks and resource management systems such as Apache Spark and Apache Mesos. I also developed internal evaluation tools to assist large-scale geospatial analysis

Projects

ALEX

ALEX is a new class of learned indexes which addresses issues that arise when implementing dynamic and updatable learned indexes.

PDF Code

Tabula

Tabula is a middleware that runs on top of a SQL data system with the purpose of increasing the interactivity of geospatial visualization dashboards.

PDF Code Video

GeoSparkSim

GeoSparkSim is a scalable traffic simulator which extends Apache Spark to generate large-scale road network traffic datasets with microscopic traffic simulation.

PDF Code Slides Video Follow

Hermit

Hermit is a succinct secondary indexing mechanism for modern RDBMSs. It judiciously leverages the rich soft functional dependencies hidden among columns to prune out redundant structures for indexed key access.

PDF Video

GeoSparkViz

GeoSparkViz is a large-scale geospatial map visualization framework. GeoSparkViz extends Apache Spark to provide native support for general cartographic design.

PDF Code Follow

Hippo

Hippo is a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance.

PDF Code Slides Video

Apache Sedona

Apache Sedona is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL to efficiently load, process, and analyze large-scale spatial data across machines.

PDF Code Slides Follow

Selected Publications

Liang Geng, Rubao Lee, Dewey Dunnington, Feng Zhang, Jia Yu, Xiaodong Zhang (2026). RayBooster: A Ray Tracing Engine to Accelerate SedonaDB. In VLDB.

PDF Code

Congying Wang, Jia Yu, Zhuoyue Zhao (2023). GLIN: A (G) eneric (L) earned (In) dexing Mechanism for Complex Geometries. In ACM SIGSPATIAL BigSpatial Workshop.

PDF Code

Jia Yu, Mohamed Sarwat (2021). GeoSparkViz: A Cluster Computing System for Visualizing Massive-Scale Geospatial Data. In VLDB Journal.

Code Project Project website

Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Hantian Zhang, Yinan Li, Jaeyoung Do, Donald Kossmann, Johannes Gehrke, David Lomet, Badrish Chandramouli, Tim Kraska (2020). ALEX: An Updatable Adaptive Learned Index. In ACM SIGMOD.

PDF Code Project Technical report

Jia Yu, Mohamed Sarwat (2020). Turbocharging Geospatial Visualization Dashboards via a Materialized Sampling Cube Approach. In IEEE ICDE.

PDF Code Project

Yingjun Wu, Jia Yu, Yuanyuan Tian, Ronald Barber, Richard Sidle (2019). Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations. In ACM SIGMOD.

PDF Project

Jia Yu, Zongsi Zhang, Mohamed Sarwat (2019). Spatial Data Management in Apache Spark: the GeoSpark Perspective and Beyond. In Geoinformatica.

PDF Code Project website

Jia Yu, Mohamed Sarwat (2016). Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems. In VLDB.

PDF Code Project

Recent publications

Quickly discover relevant content by filtering publications.

Liang Geng, Rubao Lee, Dewey Dunnington, Feng Zhang, Jia Yu, Xiaodong Zhang (2026). RayBooster: A Ray Tracing Engine to Accelerate SedonaDB. In VLDB.

PDF Code

Ruichen Wang, Yiqun Xie, Leo Du, Jia Yu, Kyle Duncan, Sinéad Farrell, Zhili Li, Kangyang Chai (2025). Coincident Data Discovery Engine: A Portal for Global-Scale Cross-Platform Satellite Data Search. In ACM SIGSPATIAL.

PDF Code

Congying Wang, Jia Yu, Zhuoyue Zhao (2023). GLIN: A (G) eneric (L) earned (In) dexing Mechanism for Complex Geometries. In ACM SIGSPATIAL BigSpatial Workshop.

PDF Code

Yiqun Xie, Xiaowei Jia, Han Bao, Xun Zhou, Jia Yu, Rahul Ghosh, Praveen Ravirathinam (2021). Spatial-Net: A Self-Adaptive and Model-Agnostic Deep Learning Framework for Spatially Heterogeneous Datasets. In ACM SIGSPATIAL.

Jia Yu, Mohamed Sarwat (2021). GeoSparkViz: A Cluster Computing System for Visualizing Massive-Scale Geospatial Data. In VLDB Journal.

Code Project Project website

See all publications

Awards

Outstanding Reviewers

ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems Nov 2023

See certificate

Best Demo Paper Runner-Up

International Symposium on Spatial and Temporal Databases, SSTD Aug 2019

Engineering Graduate Fellowship

Arizona State University, Ira A. Fulton Schools of Engineering Apr 2019

Third Place of Student Research Competition

ACM SIGSPATIAL Nov 2017

Student Travel Grant

National Science Foundation, Microsoft Jan 2015 – Jan 2020

IEEE ICDE (3 times), ACM SIGSPATIAL (5 times = 4 NSF + 1 Microsoft)

Outstanding graduate

Northwest Agriculture and Forestry University Jul 2013

Only 200 out of 5600 students were selected

First-class Scholarship, Merit Student

Northwest Agriculture and Forestry University Sep 2011 – Sep 2012

2 times, only top 10% students (in terms of GPA) were selected

Services

Program Committee member

International conferences 2020 – 2025

ACM SIGMOD 2023 - 2025
VLDB 2023
SIGSPATIAL 2020 - 2023
SSTD 2023
MDM 2022 - 2024

Invited reviewer

International journals 2018 – 2020

VLDB Journal (VLDBJ)
ACM Transactions on Spatial Algorithms and Systems (TSAS)
International Journal of Geographical Information Science (IJGIS)
Geoinformatica Journal
IEEE Transactions on Cloud Computing (TCC)
Computers and Geosciences (CAGEOS)
IEEE Transactions on Parallel and Distributed Systems (TPDS)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Frontiers in Big Data

See certificate

External reviewer

International conferences 2016 – 2020

SIGMOD: 2017, 2018, 2019
SIGMOD demo: 2016, 2018
PVLDB: 2016, 2017, 2018, 2019, 2020
ICDE: 2020
ICDE demo: 2017, 2018
SIGSPATIAL: 2016, 2017, 2018
SSTD: 2017
MDM: 2016