Abstract: Social media messages, such as tweets, are frequently used by people during natural disasters to share real-time information and to report incidents. Within these messages, geographic locations are often described. Accurate recognition and geolocation of these locations is critical for reaching those in need. This paper focuses on the first part of this process, namely recognizing locations from social media messages. While general named entity recognition (NER) tools are often used to recognize locations, their performance is limited due to the various language irregularities associated with social media text, such as informal sentence structures, inconsistent letter cases, name abbreviations, and misspellings. We present NeuroTPR, which is a Neuro-net ToPonym Recognition model designed specifically with these linguistic irregularities in mind. Our approach extends a general bidirectional recurrent neural network model with a number of features designed to address the task of location recognition in social media messages. We also propose an automatic workflow for generating annotated datasets from Wikipedia articles for training toponym recognition models. We demonstrate NeuroTPR by applying it to three test datasets, including a Twitter dataset from Hurricane Harvey, and comparing its performance with those of six baseline models.
Full paper: Jimin Wang, Yingjie Hu, and Kenneth Joseph (2020): NeuroTPR: A Neuro-net ToPonym Recognition model for extracting locations from social media messages. Transactions in GIS, accepted. [PDF]
Figure 1: The two steps of geoparsing in the context of disaster response and our focus on toponym recognition. Figure 2: The overall architecture of NeuroTPR.