Welcome to Jia Yu’s homepage

Jia is a PhD student at the Computer Science department, School of Computing, Informatics, and Decision Systems Engineering (CIDSE), Arizona State University, where he is a member of Data Systems Lab. Jia’s research focuses on database systems and geospatial data management. In particular, he worked on distributed data management systems, database indexing, data visualization. He is the main contributor of several open-sourced research projects such as GeoSpark, a cluster computing framework for processing big spatial data.

I am glad to review papers in the context of database systems and geospatial data management!

I am currently on the job market and looking for a Tenure-Track Assistant Professor position that starts in Fall 2020. Please feel free to drop me an email if you think I am a good fit. [CV][Research Statement][Teaching Statement][Diversity Statement]


  • 12/05/2019: Our project GeoSpark is featured by Databricks (the company behind Apache Spark) in its article “Processing Geospatial Data at Scale”. Databricks provides a GeoSpark notebook for Databricks Spark runtime and Delta Lake. If you have a Databricks account, now it is the time to play GeoSpark on the Databricks cloud! Please see [GeoSpark notebook on Databricks cloud][Databricks article].
  • 11/05/2019: I gave a hands-on tutorial about “Spatial Data Wrangling with GeoSpark: A Step-By-Step Tutorial” in ACM SIGSPATIAL 2019 Spatial API Workshop, Chicago. Please see the slides and coding examples.
  • 09/09/2019: I gave a talk about “Geospatial Data Management in Apache Spark” in ApacheCon 2019 North America, Las Vegas. Please see the slides.
  • 09/04/2019: We received the Best Demo Paper Runner-Up award at SSTD 2019. The demo features GeoSparkSim, a data system that generates large-scale road network traffic simulations (Certificate).
  • 08/15/2019: I will teach a graduate class CSE 511 Data Processing at Scale this Fall semester. This course covers the design, deployment and use of state-of-the-art data processing systems, which provide scalable access to data.
  • 08/10/2019: A research paper about “Accelerating Spatial Data Visualization Dashboards via a Materialized Sampling Cube Approach” has been accepted to IEEE ICDE 2020. My paper was one of the few papers accepted directly without revision. The direct acceptance rate is 3%.
  • 07/17/2019: Gave a talk at Microsoft Research about “Designing Succinct Secondary Indexes by Exploiting Column Correlations” (video)
  • 06/06/2019: A research paper and a demo paper about “Scalable Microscopic Road Network Traffic Simulator in Apache Spark” has been accepted to MDM 2019 and SSTD 2019.
  • 06/03/2019: I will be a Research Intern at Microsoft Research (database group) this summer! My mentor is Umar Farooq Minhas. I will work on a realistic design of updatable learned indices.
  • 05/14/2019: Received ASU Ira A. Fulton Schools of Engineering “Engineering Graduate Fellowship” for the 2018‐2019 academic year.
  • 05/10/2019: A research paper and a demo paper about “Succinct Learned Secondary Indexes by Exploiting Column Correlations” have been accepted to SIGMOD 2019 and VLDB 2019. This is part of my 2018 summer intern work at IBM - Almaden.
  • 04/11/2019: Delivered 2 demo papers and 1 tutorial in IEEE ICDE 2019, with $1875 ICDE 2019 NSF Student Travel Grant. We talked about geospatial data management in Apache Spark and geographical knowledge graph management. Our tutorial website is now online.