NYSIIS Code

Surname

NYSIIS Code: Original

NYSIIS Code: Modified

New York State Identification and Intelligence System (NYSIIS) Phonetic Encoder
Source implementation by Steve Hobbs, Comserve Limited Converted to SAS by Anna Ferrante, August, 1990 Converted to Javascript by Matt Pérez, July 1999 and later modified to match Taft's original algorithm, July 2006.
NIST reference page on NYSIIS , with links to other algorithms
Alternate algorithm by Ross Patterson, Rutgers University, May 5, 1988
Alternate implementation in C# by Reggie Beneke
Mark Antro asked what to do when the name ends with "RDT" (e.g., as in GILMOURDT): should it result in "...DT" or "...D?" After reviewing various implementations, it seems to me that the original algoritm has consistenly been interpreted to refer to the last two characters of the name. Taft is silent on three-character strings. On the other hand, it sounds to me as if "RDT" should reduce to "D" and not "DT."
Thanks to Steve Skalski, Reggie Beneke, John Morrill, Graham Case and Anthony Wilson for catching various bugs. All remaining bugs are mine.

Original Algorithm:

Transcode first characters of name:

MAC	»	MCC
KN	»	NN
K	»	C
PH	»	FF
PF	»	FF
SCH	»	SSS

Transcode last characters of name:

EE, IE	»	Y
DT,RT,RD,NT,ND	»	D

First character of key = first character of name.

Transcode remaining characters by following these rules, incrementing by one character each time:

EV	»	AF	else A,E,I,O,U » A
Q	»	G
Z	»	S
M	»	N
KN	»	N	else K » C
SCH	»	SSS
PH	»	FF
H	»	If previous or next is nonvowel, previous
W	»	If previous is vowel, previous

Add current to key if current != last key character

If last character is S, remove it

If last characters are AY, replace with Y

If last character is A, remove it

Collapse all strings of repeated characters

Add original first character of name as first character of key

Modified Algorithm:

if the first character of the name is a vowel, remember it

remove all 'S' and 'Z' chars from the end of the name

transcode first characters of name

MAC	»	MC
PF	»	F

Transcode trailing strings as follows,

IX	»	IC
EX	»	EC
YE,EE,IE	»	Y
DT,RT,RD,NT,ND	»	D
repeat this last step as necessary

transcode 'EV' to 'EF' if not at start of name

use first character of name as first character of key

remove any 'W' that follows a vowel

replace all vowels with 'A' and collapse all strings of repeated 'A' to one

transcode 'GHT' to 'GT'

10.

transcode 'DG' to 'G'

11.

transcode 'PH' to 'F'

12.

if not first character, eliminate all 'H' preceded or followed by a vowel

13.

change 'KN' to 'N', else 'K' to 'C'

14.

if not first character, change 'M' to 'N'

15.

if not first character, change 'Q' to 'G'

16.

transcode 'SH' to 'S'

17.

transcode 'SCH' to 'S'

18.

transcode 'YW' to 'Y'

19.

if not first or last character, change 'Y' to 'A'

20.

transcode 'WR' to 'R'

21.

if not first character, change 'Z' to 'S'

22.

transcode terminal 'AY' to 'Y'

23.

remove trailing vowels

24.

collapse all strings of repeated characters

25.

if first character of original name is a vowel, prepend to code (or replace first transcoded 'A')

In both implementations, before the algoritm is applied, the input string is preprocessed as follows:

Convert all characters to upper case
Trim all trailing whitespace
Remove "JR," "SR," and Roman Numerals from the end of the string (i.e., where "Roman Numerals" can be a malformed run of 'I' and 'V' chars)
Remove all non-alpha characters

Click here to see a simple test page

The original algorithm comes f rom Robert L. Taft, "Name Search Techniques", New York State Identification and Intelligence System.

According to the document Duplicate Record Detection [PDF] by Elmagarmid, Ipeirotis, & Verykios, the resulting code is limited to six characters.