Thank you all for a wonderful AAG workshop!

The pandemic made it more difficult for many of us to do research. Thanks to the initiative of AAG, faculty members throughout the country were brought together to share their expertise and help students during this challenging time! I (Yingjie) had a great time leading the workshop on “Integrating Machine Learning into Geographic Research” in the past week. While the AAG committee and I initially planned only a 20-person small workshop to ensure personal level interactions, this workshop received 175 registrations from not only students but also researchers all over the world (see the map of the registrants).

Map of the workshop registrants.

This overwhelming interest is a nice surprise to us, but it also means that we would have to reject a large number of students and researchers who are eager to learn, if we were just to admit 20 participants. Meanwhile, having more participants in the workshop will make it more difficult for the students to have personal interaction with the instructors. Eventually, we admitted 21 students as “active participants” who can actively engage in the workshop, while a large number of other registrants were admitted as “observers” who can still call in and listen to the workshop lectures.

The workshop was a great experience, as I can share my knowledge on GIS and Machine Learning with a wide audience while interacting with each individual active participant. Thank you all for your participating, and many thanks to our AAG workshop committee, particularly Julaiti and Coline for their great help and support!

Dr. Hu to lead an AAG Workshop on Integrating Machine Learning into Geographic Research

With the many challenges posed by the pandemic of COVID-19, the American Association of Geographers (AAG) called upon geography faculty members throughout the nation to help offer a series of virtual workshops and seminars (for AAG members only) to support graduate students adapt in their research. Dr. Hu is one of these selected faculty members.

Our workshop is “Integrating Machine Learning into Geographic Research”. It will introduce students to the fundamental concepts and techniques related to machine learning, and how to integrate machine learning into geographic research. This workshop is designed to help students overcome some of the challenges posted by the pandemic by leveraging a free cloud computing platform, Google Colab, that allows students to build machine learning models on super computers safely at home and for free. The main programming language for this workshop will be Python, and the main machine learning package will be scikit-learn. Students can find more details or register here:

Workshop schedule

The workshop will start on Feb 8 and end on Feb 12 (note: the interactive sessions will not be recorded.)

  • Monday, asynchronous: Intro to GeoAI and Google Colab: A pre-recorded lecture consisting of two videos is made available. We will give a brief introduction to geospatial artificial intelligence (GeoAI) and help students set up the working environment on Google Colab.
  • Tuesday, synchronous, 9:00 – 11:00 AM (ET): Python Refresh and GeoPandas: We will have a 2-hour live session on Zoom. This session will help you refresh Python programming basics and learn how to work with shapefile data using the GeoPandas library.
  • Wednesday, synchronous, 9:00 – 11:00 AM (ET): Machine learning for Geography: We will have a 2-hour live session on Zoom. This session will cover the basics of preparing geographic data for machine learning models and implementing a model (Random Forest) using the scikit-learning library.
  • Thursday, asynchronous: Exercise: Building Your Own Machine Learning Model A simple exercise will be released at the end of the session on Wednesday. Students are expected to work on this exercise on Thursday, in which you will be asked to build your own model using scikit-learning.
  • Friday, synchronous, 9:00 – 11:00 AM (ET): Recap and the Road Forward: We will have a 2-hour live session on Zoom. In this session, I will review the exercise that you have worked on Thursday, and answer any questions. I will also share some resources for studying machine learning and AI beyond this workshop.

This workshop series is supported by AAG staff Coline Dony and Julaiti Nilupaer. Our GeoAI Lab thanks their great effort and dedication to help our graduate students go through this challenging time!

New paper on aligning geographic entities from historical maps for building knowledge graphs accepted in IJGIS

Our new paper on “Aligning geographic entities from historical maps for building knowledge graphs” is accepted in the International Journal of Geographical Information Science.

Historical maps contain rich geographic information about the past of a region. They are sometimes the only source of information before the availability of digital maps. Despite their valuable content, it is often challenging to access and use the information in historical maps, due to their forms of paper-based maps or scanned images. It is even more time-consuming and labor-intensive to conduct an analysis that requires a synthesis of the information from multiple historical maps. To facilitate the use of the geographic information contained in historical maps, one way is to build a geographic knowledge graph (GKG) from them. This paper proposes a general workflow for completing one important step of building such a GKG, namely aligning the same geographic entities from different maps. We present this workflow and the related methods for implementation, and systematically evaluate their performances using two different datasets of historical maps. The evaluation results show that machine learning and deep learning models for matching place names are sensitive to the thresholds learned from the training data, and a combination of measures based on string similarity, spatial distance, and approximate topological relation achieves the best performance with an average F-score of 0.89.

For more information, please refer to our full paper:
Kai Sun, Yingjie Hu, Jia Song, and Yunqiang Zhu (2020): Aligning geographic entities from historical maps for building knowledge graphs. International Journal of Geographical Information Science, in press.

New paper on a five-star guide for achieving replicability and reproducibility published in the Annals of AAG

The availability and use of geographic information technologies and data for describing the patterns and processes operating on or near the Earth’s surface have grown substantially during the past fifty years. The number of geographic information systems software packages and algorithms has also grown quickly during this period, fueled by rapid advances in computing and the explosive growth in the availability of digital data describing specific phenomena. Geographic information scientists therefore increasingly find themselves choosing between multiple software suites and algorithms to execute specific analysis, modeling, and visualization tasks in environmental applications today. This is a major challenge because it is often difficult to assess the efficacy of the candidate software platforms and algorithms when used in specific applications and study areas, which often generate different results. The subtleties and issues that characterize the field of geomorphometry are used here to document the need for (1) theoretically based software and algorithms; (2) new methods for the collection of provenance information about the data and code along with application context knowledge; and (3) new protocols for distributing this information and knowledge along with the data and code. This article discusses the progress and enduring challenges connected with these outcomes.

More details can be seen in our paper at:
John P. Wilson, Kevin Butler, Song Gao, Yingjie Hu, Wenwen Li and Dawn J. Wright (2020): A five-star guide for achieving replicability and reproducibility when working with GIS software and algorithms. Annals of the American Association of Geographers, in press. [PDF]

This paper is one of the articles in a collection on R&R in GIScience. You can find the full collection here:

New paper on the removal of precise geotagging in tweets published in Nature Human Behaviour

On June 18, 2019, Twitter announced that it would remove the precise geotagging feature in tweets. According to Twitter, this decision was based on the observation that most people do not use precise geotagging. This announcement triggered heated discussions among the general public and the research community both for and against the decision. The discussions were so intense that Twitter made a follow-up three days later clarifying that they only removed precise geotagging while general geotagging remained unchanged. So, what is geotagging and why did Twitter’s decision draw so much attention? How does this decision affect researchers? We discuss the potential impact of Twitter’s decision, its implication on location privacy, and how researchers can respond to this change.

More details can be seen in our paper at:
Yingjie Hu and Ruo-Qian Wang (2020): Understanding the removal of precise geotagging in tweets. Nature Human Behaviour, 1-3. [PDF]

Fig. 1 The three remaining approaches of geotagging after Twitter’s decision: (a) general geotagging with a place; (b) precise geotagging for photos only; (c) precise geotagging via a third-party app (Instagram as an example).

Three new book chapters accepted in the Handbook of Big Geospatial Data

We recently had three book chapters accepted in the Handbook of Big Geospatial Data.

The first chapter is “Harvesting Big Geospatial Data from Natural Language Texts”.

Abstract: A vast amount of geospatial data exists in natural language texts, such as newspapers, Wikipedia articles, social media posts, travel blogs, online reviews, and historical archives. Compared with more traditional and structured geospatial data, such as those collected by the US Geological Survey and the national statistics offices, geospatial data harvested from these unstructured texts have unique merits. They capture valuable human experiences toward places, reflect near real-time situations in different geographic areas, or record important historical information that is otherwise not available. In addition, geospatial data from these unstructured texts are often big, in terms of their volume, velocity, and variety. This chapter presents the motivations of harvesting big geospatial data from natural language texts, describes typical methods and tools for doing so, summarizes a number of existing applications, and discusses challenges and future directions.

Figure 1: Relations of places under different semantic topics extracted from a corpus of news articles from The Guardian.

More details are available in the full chapter:
Yingjie Hu and Ben Adams (2020): Harvesting big geospatial data from natural language texts. In M. Werner and Y.-Y. Chiang (Eds), Handbook of Big Geospatial Data, Springer. [PDF]


The second chapter is “Harnessing Heterogeneous Big Geospatial Data”.

Abstract: The heterogeneity of geospatial datasets is a mixed blessing in that it theoretically enables researchers to gain a more holistic picture by providing different (cultural) perspectives, media formats, resolutions, thematic coverage, and so on, but at the same time practice shows that this heterogeneity may hinder the successful combination of data, e.g., due to differences in data representation and underlying conceptual models. Three different aspects are usually distinguished in processing big geospatial data from heterogeneous sources, namely geospatial data conflation, integration, and enrichment. Each step is a progression on the previous one by taking the result of the last step, extracting useful information, and incorporating additional information to solve specific questions. This chapter introduces and clarifies the scope and goal of each of these aspects, presents existing methods, and outlines current research trends.

Figure 2: Vector data and raster data are two commonly used types. Practically, a conversion process can be applied to switch between these two types. However, such a conversion is usually not lossless. As a result, three types of conflation, namely raster and raster conflation, vector and vector conflation, and raster and vector conflation, are studied in relevant research.

More details are available in the full chapter:
Bo Yan, Gengchen Mai, Yingjie Hu, and Krzysztof Janowicz (2020): Harnessing heterogeneous big geospatial data. In M. Werner and Y.-Y. Chiang (Eds), Handbook of Big Geospatial Data, Springer. [PDF]


The third chapter is “Automatic Urban Road Network Extraction from Massive GPS Trajectories of Taxis”.

Abstract: Urban road networks are fundamental transportation infrastructures in daily life and essential in digital maps to support vehicle routing and navigation. Traditional methods of map vector data generation based on surveyor’s field work and map digitalization are costly and have a long update period. In the Big Data age, large-scale GPS-enabled taxi trajectories and high-volume ridesharing datasets become increasingly available. These datasets provide high-resolution spatiotemporal information about urban traffic along road networks. In this study, we present a novel geospatial-big-data-driven framework that includes trajectory compression, clustering, and vectorization to automatically generate urban road geometric information. A case study is conducted using a large-scale DiDi ride-sharing GPS dataset in the city of Chengdu in China. We compare the results of our automatic extraction method with the road layer downloaded from OpenStreetMap. We measure the quality and demonstrate the effectiveness of our road extraction method regarding accuracy, spatial coverage and connectivity. The proposed framework shows a good potential to update fundamental road transportation information for smart-city development and intelligent transportation management using geospatial big data.

Figure 3: A visual comparison of the extracted road network and the OSM road reference layer.

More details are available in the full chapter:
Song Gao, Mingxiao Li, Jinmeng Rao, Gengchen Mai, Timothy Prestby, Joseph Marks, and Yingjie Hu (2020): Automatic urban road network extraction from massive GPS trajectories of taxis. In M. Werner and Y.-Y. Chiang (Eds), Handbook of Big Geospatial Data, Springer. [PDF]

GeoAI Lab receives a new Microsoft AI for Earth Compute Grant for using GeoAI to model an open ecosystem in South Africa for biodiversity protection

Our proposal “Near Real-time Forecasting and Change Detection for an Open Ecosystem by Integrating Artificial Intelligence and Ecological Modeling” has been selected for an AI for Earth Microsoft Azure Compute Grant. We are awarded $15,000 computing credits for using Azure cloud services to develop and train geospatial deep learning models for biodiversity protection.

Open (i.e., non-forest) ecosystems, such as savannas, shrublands, and grasslands, make up over 40% of the global total ecosystem organic carbon, and harbor a substantial proportion of the world’s biodiversity. Accurately forecasting the state of vegetation and detecting abnormal changes are critical for managing the biodiversity, fire, water, and carbon in these open ecosystems. This proposed project will integrate state-of-the-art AI techniques with ecological models with the goal of providing accurate forecasting and change detection on the state of vegetation in an open ecosystem. We will focus on the Cape Floristic Region (CFR) of South Africa, which contains 20% of Africa’s plant diversity and is a Global Biodiversity Hotspot and UNESCO World Heritage Site. The outcomes of this project will include models and tools that can provide near real-time forecasting and change detection for the studied open ecosystem of CFR and could also be applied to other ecosystems with similar dynamics.

Our research team consists of:
Dr. Yingjie Hu, Principal Investigator, GeoAI Lab, Department of Geography, University at Buffalo, State University of New York, United States
Dr. Adam M. Wilson, Co-Investigator, Wilson Lab, Department of Geography, University at Buffalo, State University of New York, United States
Dr. Glenn R. Moncrieff, Co-Investigator, Fynbos Node, South African Environmental Observation Network, South Africa
Dr. Jasper A. Slingsby, Co-Investigator, Fynbos Node, South African Environmental Observation Network, South Africa

How do people describe locations during a natural disaster? New paper examining tweets from Hurricane Harvey is accepted in GIScience 2021

Our recent work on examining how people describe locations during natural disasters has been accepted as a full paper in the flagship GIScience conference. Due to COVID-19, this year’s conference is canceled; however, it is postponed to the next year, so it becomes GIScience 2021 🙂

Abstract: Social media platforms, such as Twitter, have been increasingly used by people during natural disasters to share information and request for help. Hurricane Harvey was a category 4 hurricane that devastated Houston, Texas, USA in August 2017 and caused catastrophic flooding in the Houston metropolitan area. Hurricane Harvey also witnessed the widespread use of social media by the general public in response to this major disaster, and geographic locations are key information pieces described in many of the social media messages. A geoparsing system, or a geoparser, can be utilized to automatically extract and locate the described locations, which can help first responders reach the people in need. While a number of geoparsers have already been developed, it is unclear how effective they are in recognizing and geo-locating the locations described by people during natural disasters. To fill this gap, this work seeks to understand how people describe locations during a natural disaster by analyzing a sample of tweets posted during Hurricane Harvey. We then identify the limitations of existing geoparsers in processing these tweets, and discuss possible approaches to overcoming these limitations.

Full paper: Hu, Y. & Wang, J. (2020): How do people describe locations during a natural disaster: an analysis of tweets from Hurricane Harvey, In: Proceedings of the 11th International Conference on Geographic Information Science (GIScience 2021), Sep. 27-30, Poznan, Poland. [PDF]