Biography

Jia Yu is a co-founder of Wherobots Inc., a venture-backed company for helping businesses to drive insights from spatiotemporal data. He was a Tenure-Track Assistant Professor of Computer Science at Washington State University from 2020 to 2023. He obtained his Ph.D. in Computer Science from Arizona State University. His research focuses on large-scale database systems and geospatial data management. In particular, he worked on distributed geospatial data management systems, database indexing, and geospatial data visualization. Jia’s research outcomes have appeared in the most prestigious database / GIS conferences and journals, including SIGMOD, VLDB, ICDE, SIGSPATIAL and VLDB Journal. He is the main contributor of several open-sourced research projects such as Apache Sedona, a cluster computing framework for processing big spatial data, which receives 1 million downloads per month and has users / contributors from major companies.

News

  • 1/2024: Invited to a Program Committee member of ACM SIGMOD 2025, ACM SIGSPATIAL 2023, IEEE MDM 2024
  • 11/2023: Received the Outstanding Reviewer award from ACM SIGSPATIAL 2023
  • 08/2023: I have resigned from WSU.
  • 06/2023: Wherobots raised $5.5 Million in seed round led by Clear Ventures and Wing VC.
  • 01/2023: Apache Sedona is now an Apache Software Foundation Top Level Project (TLP). See the Announcement from ASF. Being a TLP at Apache signifies that the project is mature, independent, and enjoys strong community support.
  • 12/2022: Invited to be a Program Committee member of ACM SIGMOD 2024.
  • 09/2022: I am currently on leave of absence from WSU.

Interests

  • Database systems
  • Distributed data systems
  • Geospatial data management

Education

  • Ph.D. in Computer Science, 2020

    Arizona State University

  • B.E. in Software Eng., Outstanding Graduate, 2013

    Northwest Agriculture and Forestry University, China (西北农林科技大学, Project 985 & 211)

Experience

 
 
 
 
 

Co-founder

Wherobots Inc.

Sep 2022 – Present Seattle, Washington
 
 
 
 
 

Assistant Professor

Washington State University, School of Electrical Engineering and Computer Science

Aug 2020 – Aug 2023 Pullman, Washington
Full-time: August 2020 - August 2022
Leave of Absence: August 2022 - August 2023
 
 
 
 
 

Research Intern

Microsoft Research, Database group

Jun 2019 – Aug 2019 Redmond, Washington
– Microsoft is the birthplace of Micrsoft SQL Server
– Mentor / Collaborators: Umar Farooq Minhas, David Lomet, Jaeyoung Do, Yinan Li, Chi Wang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann
– I worked on a realistic design of updatable learned indices
SIGMOD 2020 research paper ALEX: An Updatable Adaptive Learned Index
 
 
 
 
 

Research Intern

IBM Almaden Research Center, Database group

May 2018 – Aug 2018 San Jose, California
– IBM-Almaden is the birthplace of relational model, SQL and DB2 DBMS
– Mentor / Collaborators: Vijayshankar Raman, Yingjun Wu, Yuanyuan Tian, Ronald Barber, Richard Sidle
– I participated in Hermit project to design a succinct secondary index. I also explored the code generation issues on compressed database tables and implemented a preliminary code generator with JIT execution using LLVM, for IBM HTAP system
SIGMOD 2019 research paper Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations
VLDB 2019 demo paper HERMIT in action: Succinct secondary indexing mechanism via correlation exploration
 
 
 
 
 

Software Development Intern

Apple, Maps team

Aug 2016 – Jun 2016 Cupertino, California
– Apple is the birthplace of Apple Maps
– Mentor: Huang-Hsiang Cheng; Manager: Alex Radeski
– I deployed and improved distributed computing frameworks and resource management systems such as Apache Spark and Apache Mesos. I also developed internal evaluation tools to assist large-scale geospatial analysis

Projects

*

ALEX

ALEX is a new class of learned indexes which addresses issues that arise when implementing dynamic and updatable learned indexes.

Tabula

Tabula is a middleware that runs on top of a SQL data system with the purpose of increasing the interactivity of geospatial visualization dashboards.

GeoSparkSim

GeoSparkSim is a scalable traffic simulator which extends Apache Spark to generate large-scale road network traffic datasets with microscopic traffic simulation.

Hermit

Hermit is a succinct secondary indexing mechanism for modern RDBMSs. It judiciously leverages the rich soft functional dependencies hidden among columns to prune out redundant structures for indexed key access.

GeoSparkViz

GeoSparkViz is a large-scale geospatial map visualization framework. GeoSparkViz extends Apache Spark to provide native support for general cartographic design.

Hippo

Hippo is a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance.

Apache Sedona

Apache Sedona is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL to efficiently load, process, and analyze large-scale spatial data across machines.

Awards

Third Place of Student Research Competition

Student Travel Grant

IEEE ICDE (3 times), ACM SIGSPATIAL (5 times = 4 NSF + 1 Microsoft)

Outstanding graduate

Only 200 out of 5600 students were selected

First-class Scholarship, Merit Student

2 times, only top 10% students (in terms of GPA) were selected

Services

Program Committee member

ACM SIGMOD 2023 - 2025
VLDB 2023
SIGSPATIAL 2020 - 2023
SSTD 2023
MDM 2022 - 2024

Invited reviewer

VLDB Journal (VLDBJ)
ACM Transactions on Spatial Algorithms and Systems (TSAS)
International Journal of Geographical Information Science (IJGIS)
Geoinformatica Journal
IEEE Transactions on Cloud Computing (TCC)
Computers and Geosciences (CAGEOS)
IEEE Transactions on Parallel and Distributed Systems (TPDS)
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Frontiers in Big Data
See certificate

External reviewer

SIGMOD: 2017, 2018, 2019
SIGMOD demo: 2016, 2018
PVLDB: 2016, 2017, 2018, 2019, 2020
ICDE: 2020
ICDE demo: 2017, 2018
SIGSPATIAL: 2016, 2017, 2018
SSTD: 2017
MDM: 2016

Teaching

CptS 223 Advanced Data Structures (Java)

Instructor, Undergraduate level, Computer Science, Washington State University

CptS 415 Big data

Instructor, Senior undergraduate level, Computer Science, Washington State University

CptS 223 Advanced Data Structures (C/C++)

Instructor, Undergraduate level, Computer Science, Washington State University

CptS 415 Big data

Instructor, Senior undergraduate level, Computer Science, Washington State University

CSE 511 Data Processing at Scale

Instructor, Graduate level, Computer Science, Arizona State University

Recent & Upcoming Talks

Slides of my talks are usually available unless forbidden by Non-Disclosure Agreements

Spatial Data Wrangling With GeoSpark - A Step-by-Step Tutorial
GeoSpark and Geospatial Data Management in Apache Spark
ALEX - An Updatable Learned Index
Designing Succinct Secondary Indexes by Exploiting Column Correlations

Contact