Posts

New paper accepted in the 3rd ACM SIGSPATIAL Workshop on Geospatial Humanities for evaluating neural network based geoparsers

Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform

Abstract: Geoparsing is an important task in geographic information retrieval. A geoparsing system, known as a geoparser, takes some texts as the input and outputs the recognized place mentions and their location coordinates. In June 2019, a geoparsing competition, Toponym Resolution in Scientific Papers, was held as one of the SemEval 2019 tasks. The winning teams developed neural network based geoparsers that achieved outstanding performances (over 90% precision, recall, and F1 score for toponym recognition). This exciting result brings the question “are we there yet?”, namely have we achieved high enough performances to possibly consider the problem of geoparsing as solved? One limitation of this competition is that the developed geoparsers were tested on only one dataset which has 45 research articles collected from the particular domain of Bio-medicine. It is known that the same geoparser can have very different performances on different datasets. Thus, this work performs a systematic evaluation of these state-of-the-art geoparsers using our recently developed benchmarking platform EUPEG that has eight annotated datasets, nine baseline geoparsers, and eight performance metrics. The evaluation result suggests that these
new geoparsers indeed improve the performances of geoparsing on multiple datasets although some challenges remain.

Jimin Wang & Yingjie Hu (2019): Are We There Yet? Evaluating State-of-the-Art Neural Network based Geoparsers Using EUPEG as a Benchmarking Platform, In: Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Geospatial Humanities, Nov. 5, Chicago, USA. [PDF]

New editorial on GeoAI published in International Journal of Geographical Information Science

What is the current state-of-the-art in integrating results from artificial intelligence research into geographic information science and the earth sciences more broadly? Does GeoAI research contribute to the broader field of AI, or does it merely apply existing results? What are the historical roots of GeoAI? Are there core topics and maybe even moonshots that jointly drive this emerging community forward? We answer these questions in our recent editorial by providing an overview of past and present work, explain how a change in data culture is fueling the rapid growth of GeoAI work, and point to future research directions that may serve as common measures of success.

The full GeoAI editorial on IJGIS:
Janowicz, K., Gao, S., McKenzie, G., Hu, Y. & Bhaduri, B. (2020): GeoAI: Spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond, International Journal of Geographical Information Science, 34(4), 625-636. [PDF]

The entire special issue can be accessed here.

New paper accepted in Transactions in GIS on the EUPEG platform for evaluating geoparsers on heuristics, machine learning, and deep learning methods

A new paper led by GeoAI lab member Jimin Wang, “Enhancing spatial and textual analysis with EUPEG: An extensible and unified platform for evaluating geoparsers”, is accepted by the journal Transactions in GIS: https://onlinelibrary.wiley.com/doi/10.1111/tgis.12579

Abstract: A rich amount of geographic information exists in unstructured texts, such as web pages, social media posts, housing advertisements, and historical archives. Geoparsers are useful tools that extract structured geographic information from unstructured texts, thereby enabling spatial analysis on textual data. While a number of geoparsers have been developed, they have been tested on different data sets using different metrics. Consequently, it is difficult to compare existing geoparsers or to compare a new geoparser with existing ones. In recent years, researchers have created open and annotated corpora for testing geoparsers. While these corpora are extremely valuable, much effort is still needed for a researcher to prepare these data sets and deploy geoparsers for comparative experiments. This article presents EUPEG: an Extensible and Unified Platform for Evaluating Geoparsers. EUPEG is an open source and web‐based benchmarking platform which hosts the majority of open corpora, geoparsers, and performance metrics reported in the literature. It enables direct comparison of the geoparsers hosted, and a new geoparser can be connected to EUPEG and compared with other geoparsers. The main objective of EUPEG is to reduce the time and effort that researchers have to spend in preparing data sets and baselines, thereby increasing the efficiency and effectiveness of comparative experiments.

Online demo: https://geoai.geog.buffalo.edu/EUPEG
Code and data: https://github.com/geoai-lab/EUPEG

The input and output of geoparsing and its two main steps

The overall architecture of EUPEG.

A screenshot of EUPEG and the (1)–(2)–(3) workflow for running an experiment

Running time of different geoparsers on GeoCorpora.

An illustration of the AUC for quantifying the overall error distance of a geoparser.

Using machine learning methods to analyze online neighborhood reviews for understanding the perceptions of people toward their living environments

The perceptions of people toward neighborhoods reveal their satisfactions with their living environments and their perceived quality of life. Recently, there is an emergence of websites designed for helping people to find suitable places to live. On these websites, current and previous residents can review their neighborhoods by providing numeric ratings and textual comments. Such online neighborhood review data provide novel opportunities for studying the perceptions of people toward their neighborhoods. In this work, we analyze such online neighborhood review data. Specifically, we extract two types of knowledge from the data: 1) semantics, i.e., the semantic topics (or aspects) that people talk about their neighborhoods; and 2) sentiments, i.e., the emotions that people express toward the different aspects of their neighborhoods. We experiment with a number of different computational models in extracting these two types of knowledge and compare their performances. The experiments are based on a dataset of online reviews about the neighborhoods in New York City (NYC), which were contributed by 7,673 distinct Web users. We also conduct correlation analyses between the subjective perceptions extracted from this dataset and the objective socioeconomic attributes of NYC neighborhoods, and find similarities and differences. The effective models identified in this research can be applied to neighborhood reviews in other cities for supporting urban planning and quality of life studies.

More details about this work can be found in our full paper: Yingjie Hu, Chengbin Deng, and Zhou Zhou (2019): A semantic and sentiment analysis on online neighborhood reviews for understanding the perceptions of people toward their living environment. Annals of the American Association of Geographers, 109(4), 1052-1073. [PDF]

(a) Some neighborhood reviews on Niche; (b) average ratings of NYC neighborhoods based on Niche review data.

Eight semantic topics discovered from the online reviews using LDA.

Average neighborhood perception maps for the eight semantic topics using LARA.

Media coverage about this work:

New book chapter accepted in the GIS&T Body of Knowledge on artificial intelligence approaches

Our new book chapter Artificial Intelligence Approaches is accepted as by the UCGIS GIS&T Body of Knowledge.

Artificial Intelligence (AI) has received tremendous attention from academia, industry, and the general public in recent years. The integration of geography and AI, or GeoAI, provides novel approaches for addressing a variety of problems in the natural environment and our human society. This entry briefly reviews the recent development of AI with a focus on machine learning and deep learning approaches. We discuss the integration of AI with geography and particularly geographic information science, and present a number of GeoAI applications and possible future directions.

Relations among AI, machine learning, and deep learning (Bennett 2018).

An illustration of terrain feature detection results of hill (a), impact crater (b), meander (c), and volcano (d) from remote sensing imagery.

Emerging hot spot map for seagrass habitats under increasing ocean temperature

Additional Resources

1. GeoAI Data Science Virtual Machine – http://esriurl.com/geoai2018
2. Microsoft AI for Earth Initiative including grants – http://aka.ms/aiforearth
3. AI for Earth Deep Learning Student Story Map – http://esriurl.com/cassava
4. Machine Learning Tools in ArcGIS – http://esriurl.com/ml
5. Learn ArcGIS Lesson – Predict Seagrass with Machine Learning –
https://learn.arcgis.com/en/projects/predict-seagrass-habitats-with-machine-learning/
6. ArcGIS Export Training Data for Deep Learning Tool – http://esriurl.com/dltool
7. Podcast – Location Intelligence + Artificial Intelligence: Making Data Smarter, Part 1 –
https://www.esri.com/about/newsroom/podcast/location-intelligence-artificial-intelligence-makingdata-smarter/
8. Podcast – Location Intelligence + Artificial Intelligence: Making Data Smarter, Part 2 –
https://www.esri.com/about/newsroom/podcast/location-intelligence-artificial-intelligence-makingdata-smarter-part-2/
9. Podcast – How AI and Location Intelligence Can Drive Business Growth –
https://www.esri.com/about/newsroom/podcast/ai-and-location-will-drive-tomorrows-digitaltransformations/

Building benchmarking frameworks for supporting replicability and reproducibility in GIScience research

This is a position paper we presented at the workshop on Replicability and Reproducibility in Geospatial Research held at Arizona State University.

Replicability and reproducibility (R&R) are critical for the long-term prosperity of a scientific discipline. In GIScience, researchers have discussed R&R related to different research topics and problems, such as local spatial statistics, digital earth, and metadata (Fotheringham, 2009; Goodchild, 2012; Anselin et al., 2014). This position paper proposes to further support R&R by building benchmarking frameworks in order to facilitate the replication of previous research for effective and efficient comparisons of methods and software tools developed for addressing the same or similar problems. Particularly, this paper will use geoparsing, an important research problem in spatial and textual analysis, as an example to explain the values of such benchmarking frameworks.

Today’s Big Data era brings large amounts of unstructured texts, such as Web pages, historical archives, news articles, social media posts, incident reports, and business documents, which contain rich geographic information. Geoparsing is a necessary step for extracting structured geographic information from unstructured texts (Jones and Purves, 2008). A developed geoparsing system, called a geoparser, can take unstructured texts as the input and output the recognized place names and their corresponding spatial footprints. In recent years, geoparsers are playing an increasingly important role in research related to disaster response, digital humanities, and others.

Since a number of geoparsers have already been developed by previous studies, a researcher, who would like to propose a new (and better) geoparser, would ideally replicate previous research and compare his or her geoparser with the existing ones in order to demonstrate its superiority. In reality, conducting such a comparative experiment is often difficult, due to several reasons: (1) Some existing geoparsers do not provide source code. In order to perform a comparison, one has to spend a considerable amount of effort to re-implement a previous method. Even when a researcher does so, the implementation could be criticized as not a correct implementation if the comparative results seem to favor the new method by the researcher. (2) For geoparsers which provide source codes, it still takes a lot of time and efforts for one to deploy the code and run it over some datasets, and any incorrect con figurations can make the replication unsuccessful. (3) Some studies do not share the data used for training and testing the geoparsers. There exist policy restrictions (e.g., Twitter only allows one to share tweet IDs instead of the full tweet content) and privacy concerns
that prevent one from sharing data. (4) For studies that do share data, it still takes considerable amount of time for another research group to find this dataset, download it, understand its structure and semantics, and use it for experiments. Due to these reasons, it becomes difficult to replicate previous geoparsing research in order to conduct a comparative experiment.

Another factor that affects R&R is the dynamic nature of the Web. With today’s fast technological advancements, algorithms backing online applications, such as search engines and recommendation systems, can change day by day. Consider a researcher (Let’s call her researcher A) who published a paper in 2017, in which she compared her geoparser with the state-of-the-art commercial geoparser from a major tech company, and showed that her geoparser had a better performance. Then in 2018, researcher B repeated the experiment and found that the geoparser developed by researcher A, in fact, performed worse than the commercial geoparser from the company. Does this mean the work of researcher A is not replicable? Probably not. The tech company may have internally changed its algorithm in 2018, and therefore the comparative experiment conducted by researcher B is no longer based on the same algorithm used in the experiment of researcher A.

This position paper proposes a benchmarking framework for geoparsing, which is an open-source and Web-based system. It addresses the limitations discussed above with two designs. First, it hosts a number of openly available datasets and existing geoparsers. In order to test the performance of a new geoparser, one can connect the newly developed geoparser to the system, and run it against the other hosted geoparsers on the same datasets. Testing different geoparsers on the same dataset and testing the same geoparser on different datasets are extremely important, since both
our previous experiments and other studies show that the performances of different geoparsers can vary dramatically when given different datasets (Hu et al., 2014; Gritta et al., 2018). Researchers can also upload their own datasets to this benchmarking framework for testing. In addition, since the system itself does not publicly share the hosted datasets, it sidesteps the restrictions from some data sharing policies. In short, this design can reduce the time and efforts that researchers have to spend in implementing existing baselines for conducting comparative experiments. Second, the benchmarking framework enables the recording of scientific experiments. As researchers conduct evaluation experiments on this system, details of the experiments are recorded automatically, which can include the date and time, datasets selected, baselines selected, metrics, experiment results, and so forth. The benchmarking framework will provide researchers with a unique id which allows them to search the experiment result. One can even provide such an id in papers submitted to journals or conferences, so that reviewers can check the raw results of the experiments quickly. These experiment records can serve as evidence for R&R. If we go back to the previous example, researcher A can provide such an experiment id to prove that she indeed conducted such an experiment and obtained the reported result.

In conclusion, this position paper proposed to build benchmarking frameworks to support R&R in geospatial research. While the discussion focused on geoparsing in spatial and textual analysis, the same idea can be applied to other geospatial problems, such as land use and land cover classification, to facilitate effective and efficient comparisons of methods. Such a framework also records experiment details and allows the search of previous experiment results. The evaluation results from the benchmarking frameworks are not to replace customized evaluations necessary for particular projects, but to serve as supplementary information for understanding developed methods.

References
– Luc Anselin, Sergio J Rey, and Wenwen Li. Metadata and provenance for spatial analysis: the case of spatial weights. International Journal of Geographical Information Science, 28(11):2261-2280, 2014.

– A Stewart Fotheringham. The problem of spatial autocorrelation and local spatial statistics. Geographical analysis, 41(4):398-403, 2009.

– Michael F Goodchild. The future of digital earth. Annals of GIS, 18(2):93-98, 2012.

– Milan Gritta, Mohammad Taher Pilehvar, Nut Limsopatham, and Nigel Collier. What’s missing in geographical parsing? Language Resources and Evaluation, 52(2):603-623, 2018.

– Yingjie Hu, Krzysztof Janowicz, and Sathya Prasad. Improving wikipedia-based place name disambiguation in short texts using structured data from dbpedia. In Proceedings of the 8th workshop on geographic information retrieval, pages 1-8. ACM, 2014.

– Christopher B. Jones and Ross S. Purves. Geographical information retrieval. International Journal of Geographical Information Science, 22(3):219-228, 2008.

Dr. Hu was invited to join the UB Digital Scholarship Studio and Network

Dr. Hu was invited to join the recently established Digital Scholarship Studio and Network (DSSN) at UB. DSSN serves as a hub for linking faculty and students who are interested in building digital content and information systems and fostering collaborations on projects related to digital humanities, information technologies, data science, and so forth. Dr. Hu was invited to join DSSN for his expertise in data mining, spatial analysis, and textual analysis.

2019 Earth Day at UB

Earth Day is a world-wide annual event to demonstrate support for environmental protection. 2019 Earth Day Celebration is organized by the Department of Geography, NCGIA-UB, and the Geography Graduate Student Associations (GGSA) at the University at Buffalo.

Lab member Jimin Wang made a poster presentation on his recent work on “Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers”.