Jia is a PhD candidate at the Computer Science department, School of Computing, Informatics, and Decision Systems Engineering (CIDSE), Arizona State University, where he is a member of Data Systems Lab. He expects to obtain his Ph.D. in June 2020.
- 03/13/2020: A research paper about “Updatable Adaptive Learned Index” has been accepted to SIGMOD 2020. This is part of my Summer 2019 intern work with Microsoft Research and MIT.
- 02/05/2020: The first GeoSpark paper in 2015 is the most cited paper among all 633 papers from 2014 - 2019 in ACM SIGSPATIAL.
- 12/05/2019: GeoSpark is featured by Databricks (the tech unicorn behind Apache Spark) in its article. Databricks provides a GeoSpark environment for Databricks cloud. [GeoSpark notebook on Databricks cloud][Databricks article].
Background The volume of geospatial data increased tremendously. Such data includes but is not limited to weather maps, Internet-of-Things sensors, and geo-tagged social media. Many data-intensive geospatial analytics applications, such as Machine Learning algorithms, highly rely on the underlying data infrastructures such as database management systems (DBMS) to efficiently manipulate, retrieve and manage data. Unfortunately, classic database management systems, such as MySQL, PostgreSQL, PostGIS, and ArcGIS, suffer from a significant performance drop when handling large-scale geospatial data.
Agenda My research focuses on crafting database systems to accelerate large-scale geospatial data analytics. In particular, I am interested in
- building large-scale / distributed data systems for geospatial data and data streams. This will involve dramatic new changes to existing big data systems such as Apache Hadoop, Spark, Flink, Storm, and Kafka
- designing Machine Learning-enhanced spatial data structures such as indices or new physical data layouts to facilitate spatial query processing. Therefore, the user can see analysis results with lower storage cost yet at a higher speed.
- creating geospatial visualization techniques for geospatial data or data streams. The interactive visualization interfaces such as Google Maps will be able to update every minute or even every second to reflect the actual movement of millions of spatial objects.
- System-oriented research. Building data systems that really work benefits both academia and industry. My open-source GeoSpark system is one of the most popular spatial data systems on top of Apache Spark and has helped many companies.
- Research collaboration. Seeking the knowledge from and collaborating with experts in different places is the way to solve and recognize challenging problems. In the past, I collaborated with / worked at Microsoft Research, IBM Almaden Research Center and Apple.
- Diversity of research areas. Working in several research areas gives a broader vision of interdisciplinary opportunities and inspires more practical research ideas. My current interdisciplinary research that connects database systems and GIS contributes to a range of relevant disciplines such as geography and urban planning.