Three new book chapters accepted in the Handbook of Big Geospatial Data

We recently had three book chapters accepted in the Handbook of Big Geospatial Data.

The first chapter is “Harvesting Big Geospatial Data from Natural Language Texts”.

Abstract: A vast amount of geospatial data exists in natural language texts, such as newspapers, Wikipedia articles, social media posts, travel blogs, online reviews, and historical archives. Compared with more traditional and structured geospatial data, such as those collected by the US Geological Survey and the national statistics offices, geospatial data harvested from these unstructured texts have unique merits. They capture valuable human experiences toward places, reflect near real-time situations in different geographic areas, or record important historical information that is otherwise not available. In addition, geospatial data from these unstructured texts are often big, in terms of their volume, velocity, and variety. This chapter presents the motivations of harvesting big geospatial data from natural language texts, describes typical methods and tools for doing so, summarizes a number of existing applications, and discusses challenges and future directions.


Figure 1: Relations of places under different semantic topics extracted from a corpus of news articles from The Guardian.

More details are available in the full chapter:
Yingjie Hu and Ben Adams (2020): Harvesting big geospatial data from natural language texts. In M. Werner and Y.-Y. Chiang (Eds), Handbook of Big Geospatial Data, Springer. [PDF]

——————–

The second chapter is “Harnessing Heterogeneous Big Geospatial Data”.

Abstract: The heterogeneity of geospatial datasets is a mixed blessing in that it theoretically enables researchers to gain a more holistic picture by providing different (cultural) perspectives, media formats, resolutions, thematic coverage, and so on, but at the same time practice shows that this heterogeneity may hinder the successful combination of data, e.g., due to differences in data representation and underlying conceptual models. Three different aspects are usually distinguished in processing big geospatial data from heterogeneous sources, namely geospatial data conflation, integration, and enrichment. Each step is a progression on the previous one by taking the result of the last step, extracting useful information, and incorporating additional information to solve specific questions. This chapter introduces and clarifies the scope and goal of each of these aspects, presents existing methods, and outlines current research trends.


Figure 2: Vector data and raster data are two commonly used types. Practically, a conversion process can be applied to switch between these two types. However, such a conversion is usually not lossless. As a result, three types of conflation, namely raster and raster conflation, vector and vector conflation, and raster and vector conflation, are studied in relevant research.

More details are available in the full chapter:
Bo Yan, Gengchen Mai, Yingjie Hu, and Krzysztof Janowicz (2020): Harnessing heterogeneous big geospatial data. In M. Werner and Y.-Y. Chiang (Eds), Handbook of Big Geospatial Data, Springer. [PDF]

——————–

The third chapter is “Automatic Urban Road Network Extraction from Massive GPS Trajectories of Taxis”.

Abstract: Urban road networks are fundamental transportation infrastructures in daily life and essential in digital maps to support vehicle routing and navigation. Traditional methods of map vector data generation based on surveyor’s field work and map digitalization are costly and have a long update period. In the Big Data age, large-scale GPS-enabled taxi trajectories and high-volume ridesharing datasets become increasingly available. These datasets provide high-resolution spatiotemporal information about urban traffic along road networks. In this study, we present a novel geospatial-big-data-driven framework that includes trajectory compression, clustering, and vectorization to automatically generate urban road geometric information. A case study is conducted using a large-scale DiDi ride-sharing GPS dataset in the city of Chengdu in China. We compare the results of our automatic extraction method with the road layer downloaded from OpenStreetMap. We measure the quality and demonstrate the effectiveness of our road extraction method regarding accuracy, spatial coverage and connectivity. The proposed framework shows a good potential to update fundamental road transportation information for smart-city development and intelligent transportation management using geospatial big data.


Figure 3: A visual comparison of the extracted road network and the OSM road reference layer.

More details are available in the full chapter:
Song Gao, Mingxiao Li, Jinmeng Rao, Gengchen Mai, Timothy Prestby, Joseph Marks, and Yingjie Hu (2020): Automatic urban road network extraction from massive GPS trajectories of taxis. In M. Werner and Y.-Y. Chiang (Eds), Handbook of Big Geospatial Data, Springer. [PDF]

GeoAI Lab receives a new Microsoft AI for Earth Compute Grant for using GeoAI to model an open ecosystem in South Africa for biodiversity protection

Our proposal “Near Real-time Forecasting and Change Detection for an Open Ecosystem by Integrating Artificial Intelligence and Ecological Modeling” has been selected for an AI for Earth Microsoft Azure Compute Grant. We are awarded $15,000 computing credits for using Azure cloud services to develop and train geospatial deep learning models for biodiversity protection.

Abstract:
Open (i.e., non-forest) ecosystems, such as savannas, shrublands, and grasslands, make up over 40% of the global total ecosystem organic carbon, and harbor a substantial proportion of the world’s biodiversity. Accurately forecasting the state of vegetation and detecting abnormal changes are critical for managing the biodiversity, fire, water, and carbon in these open ecosystems. This proposed project will integrate state-of-the-art AI techniques with ecological models with the goal of providing accurate forecasting and change detection on the state of vegetation in an open ecosystem. We will focus on the Cape Floristic Region (CFR) of South Africa, which contains 20% of Africa’s plant diversity and is a Global Biodiversity Hotspot and UNESCO World Heritage Site. The outcomes of this project will include models and tools that can provide near real-time forecasting and change detection for the studied open ecosystem of CFR and could also be applied to other ecosystems with similar dynamics.

Our research team consists of:
Dr. Yingjie Hu, Principal Investigator, GeoAI Lab, Department of Geography, University at Buffalo, State University of New York, United States
Dr. Adam M. Wilson, Co-Investigator, Wilson Lab, Department of Geography, University at Buffalo, State University of New York, United States
Dr. Glenn R. Moncrieff, Co-Investigator, Fynbos Node, South African Environmental Observation Network, South Africa
Dr. Jasper A. Slingsby, Co-Investigator, Fynbos Node, South African Environmental Observation Network, South Africa

How do people describe locations during a natural disaster? New paper examining tweets from Hurricane Harvey is accepted in GIScience 2021

Our recent work on examining how people describe locations during natural disasters has been accepted as a full paper in the flagship GIScience conference. Due to COVID-19, this year’s conference is canceled; however, it is postponed to the next year, so it becomes GIScience 2021 🙂

Abstract: Social media platforms, such as Twitter, have been increasingly used by people during natural disasters to share information and request for help. Hurricane Harvey was a category 4 hurricane that devastated Houston, Texas, USA in August 2017 and caused catastrophic flooding in the Houston metropolitan area. Hurricane Harvey also witnessed the widespread use of social media by the general public in response to this major disaster, and geographic locations are key information pieces described in many of the social media messages. A geoparsing system, or a geoparser, can be utilized to automatically extract and locate the described locations, which can help first responders reach the people in need. While a number of geoparsers have already been developed, it is unclear how effective they are in recognizing and geo-locating the locations described by people during natural disasters. To fill this gap, this work seeks to understand how people describe locations during a natural disaster by analyzing a sample of tweets posted during Hurricane Harvey. We then identify the limitations of existing geoparsers in processing these tweets, and discuss possible approaches to overcoming these limitations.

Full paper: Hu, Y. & Wang, J. (2020): How do people describe locations during a natural disaster: an analysis of tweets from Hurricane Harvey, In: Proceedings of the 11th International Conference on Geographic Information Science (GIScience 2021), Sep. 27-30, Poznan, Poland. [PDF]

Congratulations to lab member Jimin Wang on finishing his master thesis on Advancing Spatial and Textual Analysis with GeoAI

Our lab member, Jimin Wang, recently completed his MS in GIS degree. His master project focuses on Advancing Spatial and Textual Analysis with GeoAI. Particularly, Jimin has published three related papers on this topic, which are:

Jimin’s master committee members are Dr. Yingjie Hu and Dr. Enki Yoo. In addition, Dr. Kenneth Joseph also provided guidance to Jimin’s research.

Moving forward, Jimin has received a fellowship package from UB’s PhD Excellence Initiative which aims to “recruiting the very best PhD students and providing them with transformative academic programs that prepare them for future success”. Jimin will continue his study as a PhD student in our GeoAI Lab, and we look forward to his new achievements in the coming years.

Congratulations again, Jimin!

New paper on a Neuro-net ToPonym Recognition Model (NeuroTPR) accepted in Transactions in GIS

Abstract: Social media messages, such as tweets, are frequently used by people during natural disasters to share real-time information and to report incidents. Within these messages, geographic locations are often described. Accurate recognition and geolocation of these locations is critical for reaching those in need. This paper focuses on the first part of this process, namely recognizing locations from social media messages. While general named entity recognition (NER) tools are often used to recognize locations, their performance is limited due to the various language irregularities associated with social media text, such as informal sentence structures, inconsistent letter cases, name abbreviations, and misspellings. We present NeuroTPR, which is a Neuro-net ToPonym Recognition model designed specifically with these linguistic irregularities in mind. Our approach extends a general bidirectional recurrent neural network model with a number of features designed to address the task of location recognition in social media messages. We also propose an automatic workflow for generating annotated datasets from Wikipedia articles for training toponym recognition models. We demonstrate NeuroTPR by applying it to three test datasets, including a Twitter dataset from Hurricane Harvey, and comparing its performance with those of six baseline models.

Full paper: Jimin Wang, Yingjie Hu, and Kenneth Joseph (2020): NeuroTPR: A Neuro-net ToPonym Recognition model for extracting locations from social media messages. Transactions in GIS, accepted. [PDF]

Figure 1: The two steps of geoparsing in the context of disaster response and our focus on toponym recognition.
Figure 2: The overall architecture of NeuroTPR.

New paper on the panel discussion of geospatial humanities published in the International Journal of Humanities and Arts Computing

Andris, C., Ayers, E., Grossner, K., Hu, Y., Hart, K., Thatcher, J., Tally Jr, R.T. and Giordano, A., 2020. Towards Geospatial Humanities: Reflections from Two Panels. International Journal of Humanities and Arts Computing, 14(1-2), pp.6-26. [PDF]

This paper is based on the panel discussion at the 2019 UCGIS Symposium on the Geospatial Humanities. We discussed the opportunities and challenges for conducting interdisciplinary research integrating GIScience and humanities as well as preparing students with necessary data analysis and visualization skills for geospatial humanities work. Very interesting panel discussion and a lot of research possibilities!

Dr. Hu received 2020 Waldo-Tobler Young Researcher Award

“The Austrian Academy of Sciences’ Commission for GIScience annually selects the winner of a ‘Young Researcher’ competition, based on an outstanding publication submitted by applicants enhancing the body of literature in Geoinformatics and GIScience.”

Dr. Yingjie Hu received this year’s award for his publication:
Yingjie Hu (2018): Geo-text data and data-driven geospatial semantics. Geography Compass, 12(11), e12404. [PDF]

Yingjie greatly appreciates this recognition and looks forward to making more contributions to GIScience.

Link to the original blog article: http://gisciencecommission.blogspot.com/2020/01/2020-waldo-tobler-young-researcher.html

New paper published in ACM SIGSPATIAL Special on the progress and challenges of GeoAI

Geospatial artificial intelligence (GeoAI) is an interdisciplinary field that has received tremendous attention from both academia and industry in recent years. We recently published an article that reviews the series of GeoAI workshops held at the Association for Computing Machinery (ACM) International Conference on Advances in Geographic Information Systems (SIGSPATIAL) since 2017. These workshops have provided researchers a forum to present GeoAI advances covering a wide range of topics, such as geospatial image processing, transportation modeling, public health, and digital humanities. We provide a summary of these topics and the research articles presented at the 2017, 2018, and 2019 GeoAI workshops. We conclude with a list of open research directions for this rapidly advancing field.

Full article: Yingjie Hu, Song Gao, Dalton Lunga, Wenwen Li, Shawn Newsam, and Budhendra Bhaduri (2019): GeoAI at ACM SIGSPATIAL: progress, challenges, and future directions, ACM SIGSPATIAL Special, 11(2), 5-15. [PDF]

Proceedings of the GeoAI workshops in 2017, 2018, and 2019 are available here:

New paper accepted in the 3rd ACM SIGSPATIAL Workshop on Geospatial Humanities for evaluating neural network based geoparsers

Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform

Abstract: Geoparsing is an important task in geographic information retrieval. A geoparsing system, known as a geoparser, takes some texts as the input and outputs the recognized place mentions and their location coordinates. In June 2019, a geoparsing competition, Toponym Resolution in Scientific Papers, was held as one of the SemEval 2019 tasks. The winning teams developed neural network based geoparsers that achieved outstanding performances (over 90% precision, recall, and F1 score for toponym recognition). This exciting result brings the question “are we there yet?”, namely have we achieved high enough performances to possibly consider the problem of geoparsing as solved? One limitation of this competition is that the developed geoparsers were tested on only one dataset which has 45 research articles collected from the particular domain of Bio-medicine. It is known that the same geoparser can have very different performances on different datasets. Thus, this work performs a systematic evaluation of these state-of-the-art geoparsers using our recently developed benchmarking platform EUPEG that has eight annotated datasets, nine baseline geoparsers, and eight performance metrics. The evaluation result suggests that these
new geoparsers indeed improve the performances of geoparsing on multiple datasets although some challenges remain.

Jimin Wang & Yingjie Hu (2019): Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform, In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities, Nov. 5, Chicago, USA. [PDF]