Welcome to Jia Yu’s homepage

Jia is a PhD candidate at the Computer Science department, School of Computing, Informatics, and Decision Systems Engineering (CIDSE), Arizona State University, where he is a member of Data Systems Lab. Jia’s research focuses on database systems and geospatial data management. In particular, he worked on distributed data management systems, database indexing, data visualization. He is the main contributor of several open-sourced research projects such as GeoSpark, a cluster computing framework for processing big spatial data.

I am glad to review papers in the context of database systems and geospatial data management!

I am currently on the job market and looking for a Tenure-Track Assistant Professor position that starts in Fall 2020. Please feel free to drop me an email if you think I am a good fit. [CV][Research Statement][Teaching Statement][Diversity Statement]


Please read my CV to know more about me

Research focusTheme: Large-scale geospatial databases
Topics: cluster computing, interactive visualization, lightweight database index
ApplicationHelp data scientists analyze big geospatial data in a scalable and interactive way at lower cost of time and money
Urban planning, animal migration, transportation engineering, climate change analytics
Papers18 publications (SIGMOD, VLDB, ICDE, SSTD, Geoinformatica), 2 under review, 1 under revision (SIGMOD)
13 first-author, 5 second-author
All my publications with my advisor have <= 3 authors
CitationsGoogle Scholar: 320+
The GeoSpark paper in 2015 is the most cited paper among all 633 papers from 2014 - 2019 in ACM SIGSPATIAL
InternshipMicrosoft Research (database group, the birthplace of SQL Server) - SIGMOD’20 (under revision)
IBM Almaden Research Center (database group, the birthplace of relational model, SQL and DB2) - SIGMOD’19, VLDB’19
Apple (map team, the birthplace of Apple Map)
Open-sourceAll my ASU projects are on GitHub
over 1000 stars + forks
10,000 monthly downloads
Industry impactUsers and code contributors of my ASU projects are from major IT companies, such as Uber, MoBike (摩拜单车), and Facebook
Many companies use my ASU projects in production: Databricks GeoSpark notebook, Gyana BI dashboard powered by GeoSpark
System hackAll my ASU projects are implemented in the kernel of widely used data systems
GeoSpark cluster computing system is in Apache Spark
Hippo index is a PostgreSQL 9.6 built-in index
TalksConference (8 times): VLDB, ICDE, SIGSPATIAL, SSTD, MDM, ApacheCon (Apache Software Foundation annual conference)
Company (7 times): Microsoft Research, IBM Almaden Research Center, Apple, NVidia, StateFarm, Vocareum
RefereesMohamed Sarwat (ASU)
Yingjun Wu (Amazon Web Services, former researcher at IBM Almaden Research Center)
Umar Farooq Minhas (Microsoft Research)
David Lomet (Microsoft Research, Member of the National Academy of Engineering)


  • 02/05/2020: My first first-author paper with my advisor Mohamed Sarwat, “GeoSpark: a cluster computing framework for processing large-scale spatial data”, was published in ACM SIGSPATIAL 2015 (one of the most prestigious conferences in spatial data management). Now it is the most cited paper among all 633 papers from 2014 - 2019, and the 7th most cited paper among all 935 papers from 2011 to 2019 in this conference, according to Microsoft Academic.
  • 12/05/2019: Our project GeoSpark is featured by Databricks (the company behind Apache Spark) in its article “Processing Geospatial Data at Scale”. Databricks provides a GeoSpark notebook for Databricks Spark runtime and Delta Lake. If you have a Databricks account, now it is the time to play GeoSpark on the Databricks cloud! Please see [GeoSpark notebook on Databricks cloud][Databricks article].
  • 11/05/2019: I gave a hands-on tutorial about “Spatial Data Wrangling with GeoSpark: A Step-By-Step Tutorial” in ACM SIGSPATIAL 2019 Spatial API Workshop, Chicago. Please see the slides and coding examples.
  • 09/09/2019: I gave a talk about “Geospatial Data Management in Apache Spark” in ApacheCon 2019 North America, Las Vegas. Please see the slides.
  • 09/04/2019: We received the Best Demo Paper Runner-Up award at SSTD 2019. The demo features GeoSparkSim, a data system that generates large-scale road network traffic simulations (Certificate).
  • 08/15/2019: I will teach a graduate class CSE 511 Data Processing at Scale this Fall semester. This course covers the design, deployment and use of state-of-the-art data processing systems, which provide scalable access to data.
  • 08/10/2019: A research paper about “Accelerating Spatial Data Visualization Dashboards via a Materialized Sampling Cube Approach” has been accepted to IEEE ICDE 2020. My paper was one of the few papers accepted directly without revision. The direct acceptance rate is 3%.
  • 07/17/2019: Gave a talk at Microsoft Research about “Designing Succinct Secondary Indexes by Exploiting Column Correlations” (video)
  • 06/06/2019: A research paper and a demo paper about “Scalable Microscopic Road Network Traffic Simulator in Apache Spark” has been accepted to MDM 2019 and SSTD 2019.
  • 06/03/2019: I will be a Research Intern at Microsoft Research (database group) this summer! My mentor is Umar Farooq Minhas. I will work on a realistic design of updatable learned indices.
  • 05/14/2019: Received ASU Ira A. Fulton Schools of Engineering “Engineering Graduate Fellowship” for the 2018‐2019 academic year.
  • 05/10/2019: A research paper and a demo paper about “Succinct Learned Secondary Indexes by Exploiting Column Correlations” have been accepted to SIGMOD 2019 and VLDB 2019. This is part of my 2018 summer intern work at IBM - Almaden.
  • 04/11/2019: Delivered 2 demo papers and 1 tutorial in IEEE ICDE 2019, with $1875 ICDE 2019 NSF Student Travel Grant. We talked about geospatial data management in Apache Spark and geographical knowledge graph management. Our tutorial website is now online.