Tag Page

record linkage

3 sources across the archive use this tag. The list below groups them by source while keeping the original topic context visible.

Back to all tags

Fuzzy String Matching

Splink: String Comparators

Also listed on Fuzzy String Matching.

This is one of the clearest compact overviews of common string comparators in practical use. It covers Levenshtein, Damerau-Levenshtein, Jaro, Jaro-Winkler, and Jaccard in one place, which makes it especially useful when you need to compare what each metric is actually sensitive to rather than reading isolated algorithm descriptions.

Its main strength is readability. The page is implementation-oriented because it is part of toolkit documentation, but that also makes it pragmatic: it helps you connect the abstract metric to the kinds of matching problems it handles well, such as transpositions, typos, or token overlap. It is a strong first reference when you want a technical overview before deciding which comparator deserves deeper study.

Fuzzy String Matching

Splink: Choosing String Comparators

Also listed on Fuzzy String Matching.

This page is the practical companion to the comparator overview because it shifts the question from definition to selection. Instead of only explaining what the algorithms are, it discusses how to choose among them for names, typos, aliases, and thresholded matching decisions.

The useful part is its attention to failure modes and tradeoffs. It explicitly notes where simple metrics break down, especially around nicknames and aliases, so it helps prevent the common mistake of treating a distance score as a universal notion of semantic sameness. Although the framing is still record-linkage oriented, the decision logic transfers well to search and data-normalization tasks.

Fuzzy String Matching

Real World Performance of Approximate String Comparators for Use in Patient Matching

Also listed on Fuzzy String Matching.

This paper is the evidence-oriented source in the group. Rather than stopping at definitions, it compares approximate string comparators in a real patient-matching setting and reports practical behavior under thresholded linkage decisions.

It is especially useful because it anchors metric choice to observed outcomes. In that domain and at a threshold of 0.8, the authors report the highest linkage sensitivity for Jaro-Winkler, which makes the paper a helpful reminder that comparator performance is application-dependent and should be validated empirically. The tradeoff is scope: it is an older study and a narrow domain reference rather than a general survey of fuzzy matching methods.