Ask Your Question
4

Can you provide any recommendations for linking the names of places/locations mentioned in a gazetteer with those found in a corpus file, despite being unsuccessful so far?

asked 2023-07-16 22:08:26 +0000

david gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
1

answered 2023-07-16 22:17:02 +0000

bukephalos gravatar image

Here are some recommendations for linking the names of places/locations mentioned in a gazetteer with those found in a corpus file:

  1. Use string matching algorithms: You can use string matching algorithms such as Levenshtein Distance or Jaro-Winkler Distance to compare the names in the gazetteer with those in the corpus file. This can help identify possible matches even if they are not exact matches.

  2. Use contextual features: You can use contextual features such as nearby words or phrases to help identify the correct match. For example, if the corpus file mentions "New York City" and the gazetteer has entries for "New York City" and "York City," you can use the surrounding words in the corpus file to identify which entry in the gazetteer is the correct match.

  3. Use a named entity recognition (NER) system: You can use a NER system to automatically identify and extract named entities from the corpus file. This can help identify which entities in the gazetteer correspond to entities in the corpus file.

  4. Normalize the names: You can normalize the names in both the gazetteer and corpus file to a standard format (e.g. lowercase, removing diacritics, etc.) to reduce the number of false negatives.

  5. Combine multiple approaches: You can combine multiple approaches to improve the accuracy of the linking process. For example, you might use string matching algorithms to identify possible matches, and then use contextual features to determine the correct match.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-07-16 22:08:26 +0000

Seen: 9 times

Last updated: Jul 16 '23