Title: Normalize and Match City Names to NUTS Regions
Version: 0.2.3
Date: 2026-02-08
Description: Normalizes city names for Germany (DE) and Switzerland (CH) and matches them to NUTS 3 regions using provided crosswalks. Features include comprehensive normalization rules, cascading matching logic (Exact NUTS -> Exact LAU -> Fuzzy), and single-source data synthesis. The package implements the NUTS classification as described in the NUTS methodology (Eurostat (2021) https://ec.europa.eu/eurostat/web/nuts).
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Imports: dplyr, stringr, stringdist, data.table, tidyr, rlang
Suggests: testthat, readxl
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2026-02-08 23:47:07 UTC; giulianetinginfrati
Author: Giulian Etingin-Frati [aut, cre]
Maintainer: Giulian Etingin-Frati <etingin-frati@kof.ethz.ch>
Depends: R (≥ 3.5.0)
Repository: CRAN
Date/Publication: 2026-02-11 19:50:02 UTC

Generate Fake City Data

Description

Generates a vector of fake city names for testing, including common variations and noise.

Usage

generate_fake_cities(n = 10, country = "DE")

Arguments

n

Integer, matching number of cities to generate.

country

"DE" or "CH".

Value

Character vector of city names.

Examples

# Generate 5 fake German cities
generate_fake_cities(5, country = "DE")

# Generate 3 fake Swiss cities
generate_fake_cities(3, country = "CH")

Local Administrative Units (LAU) Crosswalks

Description

Datasets containing mappings from city names to LAU codes and NUTS 3 regions for various countries. The data handles string normalization and matches cities to their respective statistical regions.

Usage

lau_at

lau_be

lau_bg

lau_ch

lau_cy

lau_cz

lau_de

lau_dk

lau_ee

lau_el

lau_es

lau_fi

lau_fr

lau_hr

lau_hu

lau_ie

lau_it

lau_li

lau_lt

lau_lu

lau_lv

lau_mk

lau_mt

lau_nl

lau_no

lau_pl

lau_pt

lau_ro

lau_se

lau_si

lau_sk

lau_tr

Format

Data frames with varying columns depending on the country, typically including:

lau_id

Local Administrative Unit code

lau_name

Name of the Local Administrative Unit

nuts_3_id

NUTS 3 region code

population

Population (if available)

An object of class data.frame with 2093 rows and 5 columns.

An object of class data.frame with 571 rows and 5 columns.

An object of class data.frame with 265 rows and 5 columns.

An object of class data.frame with 2135 rows and 5 columns.

An object of class data.frame with 617 rows and 5 columns.

An object of class data.frame with 6258 rows and 5 columns.

An object of class data.frame with 10972 rows and 5 columns.

An object of class data.frame with 99 rows and 5 columns.

An object of class data.frame with 79 rows and 5 columns.

An object of class data.frame with 6142 rows and 5 columns.

An object of class data.frame with 8132 rows and 5 columns.

An object of class data.frame with 309 rows and 5 columns.

An object of class data.frame with 32774 rows and 5 columns.

An object of class data.frame with 556 rows and 5 columns.

An object of class data.frame with 3155 rows and 5 columns.

An object of class data.frame with 166 rows and 5 columns.

An object of class data.frame with 7900 rows and 5 columns.

An object of class data.frame with 11 rows and 5 columns.

An object of class data.frame with 60 rows and 5 columns.

An object of class data.frame with 100 rows and 5 columns.

An object of class data.frame with 43 rows and 5 columns.

An object of class data.frame with 80 rows and 5 columns.

An object of class data.frame with 68 rows and 5 columns.

An object of class data.frame with 342 rows and 5 columns.

An object of class data.frame with 378 rows and 5 columns.

An object of class data.frame with 2477 rows and 5 columns.

An object of class data.frame with 3092 rows and 5 columns.

An object of class data.frame with 3181 rows and 5 columns.

An object of class data.frame with 290 rows and 5 columns.

An object of class data.frame with 211 rows and 5 columns.

An object of class data.frame with 2927 rows and 5 columns.

An object of class data.frame with 972 rows and 5 columns.

Source

Eurostat and national statistical institutes.


Match City Names to NUTS Regions

Description

Matches a vector of city names to NUTS 3 regions using a cascading logic for any supported country.

Usage

match_city(x, country = "DE", fuzzy = TRUE, threshold = 0.95)

Arguments

x

Character vector of city names.

country

Character string of two-letter country code (e.g. "DE", "FR").

fuzzy

Logical, whether to perform fuzzy matching.

threshold

Numeric, similarity threshold for fuzzy matching (0-1).

Value

A data frame with columns: original, city_clean, nuts_3_id, lau_name, match_type, similarity.

Examples

# Match German cities
cities <- c("Berlin", "Munich", "Hamburg")
match_city(cities, country = "DE")

# Match with exact matching only (no fuzzy)
match_city(c("Frankfurt am Main"), country = "DE", fuzzy = FALSE)

Normalize City Names

Description

Normalizes city names for EEA countries using comprehensive rules tailored to each language/region.

Usage

normalize_city(x, country = "DE")

Arguments

x

Character vector of city names.

country

Character string of the ISO 2-character country code (e.g. "DE", "FR", "PL").

Value

Character vector of normalized names.

Examples

# Normalize German city names
# Normalize German city names
normalize_city(c("M\u00FCnchen", "K\u00F6ln", "Frankfurt a.M."), country = "DE")

# Normalize Swiss city names
normalize_city(c("Z\u00FCrich", "Gen\u00E8ve", "Basel-Stadt"), country = "CH")

NUTS 3 Region Metadata

Description

Metadata for NUTS 3 regions for various countries, used for hierarchical matching.

Usage

nuts_at

nuts_be

nuts_bg

nuts_ch

nuts_cy

nuts_cz

nuts_de

nuts_dk

nuts_ee

nuts_el

nuts_es

nuts_fi

nuts_fr

nuts_hr

nuts_hu

nuts_ie

nuts_it

nuts_li

nuts_lt

nuts_lu

nuts_lv

nuts_mk

nuts_mt

nuts_nl

nuts_no

nuts_pl

nuts_pt

nuts_ro

nuts_se

nuts_si

nuts_sk

nuts_tr

Format

Data frames with columns:

nuts_3_id

NUTS 3 region code

nuts_3_name

Name of the NUTS 3 region

An object of class data.frame with 35 rows and 4 columns.

An object of class data.frame with 43 rows and 4 columns.

An object of class data.frame with 28 rows and 4 columns.

An object of class data.frame with 26 rows and 4 columns.

An object of class data.frame with 1 rows and 4 columns.

An object of class data.frame with 14 rows and 4 columns.

An object of class data.frame with 401 rows and 4 columns.

An object of class data.frame with 11 rows and 4 columns.

An object of class data.frame with 5 rows and 4 columns.

An object of class data.frame with 53 rows and 4 columns.

An object of class data.frame with 59 rows and 4 columns.

An object of class data.frame with 19 rows and 4 columns.

An object of class data.frame with 96 rows and 4 columns.

An object of class data.frame with 21 rows and 4 columns.

An object of class data.frame with 20 rows and 4 columns.

An object of class data.frame with 8 rows and 4 columns.

An object of class data.frame with 107 rows and 4 columns.

An object of class data.frame with 1 rows and 4 columns.

An object of class data.frame with 10 rows and 4 columns.

An object of class data.frame with 1 rows and 4 columns.

An object of class data.frame with 5 rows and 4 columns.

An object of class data.frame with 8 rows and 4 columns.

An object of class data.frame with 2 rows and 4 columns.

An object of class data.frame with 40 rows and 4 columns.

An object of class data.frame with 17 rows and 4 columns.

An object of class data.frame with 73 rows and 4 columns.

An object of class data.frame with 26 rows and 4 columns.

An object of class data.frame with 42 rows and 4 columns.

An object of class data.frame with 21 rows and 4 columns.

An object of class data.frame with 12 rows and 4 columns.

An object of class data.frame with 8 rows and 4 columns.

An object of class data.frame with 81 rows and 4 columns.

Source

Eurostat