New paper accepted in Transactions in GIS on the EUPEG platform for evaluating geoparsers on heuristics, machine learning, and deep learning methods

A new paper led by GeoAI lab member Jimin Wang, “Enhancing spatial and textual analysis with EUPEG: An extensible and unified platform for evaluating geoparsers”, is accepted by the journal Transactions in GIS:

Abstract: A rich amount of geographic information exists in unstructured texts, such as web pages, social media posts, housing advertisements, and historical archives. Geoparsers are useful tools that extract structured geographic information from unstructured texts, thereby enabling spatial analysis on textual data. While a number of geoparsers have been developed, they have been tested on different data sets using different metrics. Consequently, it is difficult to compare existing geoparsers or to compare a new geoparser with existing ones. In recent years, researchers have created open and annotated corpora for testing geoparsers. While these corpora are extremely valuable, much effort is still needed for a researcher to prepare these data sets and deploy geoparsers for comparative experiments. This article presents EUPEG: an Extensible and Unified Platform for Evaluating Geoparsers. EUPEG is an open source and web‐based benchmarking platform which hosts the majority of open corpora, geoparsers, and performance metrics reported in the literature. It enables direct comparison of the geoparsers hosted, and a new geoparser can be connected to EUPEG and compared with other geoparsers. The main objective of EUPEG is to reduce the time and effort that researchers have to spend in preparing data sets and baselines, thereby increasing the efficiency and effectiveness of comparative experiments.

Online demo:
Code and data:

The input and output of geoparsing and its two main steps

The overall architecture of EUPEG.

A screenshot of EUPEG and the (1)–(2)–(3) workflow for running an experiment

Running time of different geoparsers on GeoCorpora.

An illustration of the AUC for quantifying the overall error distance of a geoparser.