Surname |
NYSIIS Code: Original |
NYSIIS Code: Modified |
|
|
|
New York State Identification and Intelligence System (NYSIIS) Phonetic Encoder |
Source implementation by Steve Hobbs, Comserve Limited
Converted to SAS by Anna Ferrante, August, 1990
Converted to Javascript by Matt Pérez, July 1999 and later modified to match Taft's original algorithm, July 2006. |
NIST reference page on NYSIIS, with links to other algorithms |
Alternate algorithm by Ross Patterson, Rutgers University, May 5, 1988 |
Alternate implementation in C# by Reggie Beneke |
Mark Antro asked what to do when the name ends with "RDT" (e.g., as in GILMOURDT): should it result in "...DT" or "...D?" After reviewing various implementations, it seems to me that the original algoritm has consistenly been interpreted to refer to the last two characters of the name. Taft is silent on three-character strings. On the other hand, it sounds to me as if "RDT" should reduce to "D" and not "DT." |
Thanks to Steve Skalski, Reggie Beneke, John Morrill, Graham Case and Anthony Wilson for catching various bugs. All remaining bugs are mine. |
|
Original Algorithm: |
1. |
Transcode first characters of name:
MAC |
» |
MCC |
KN |
» |
NN |
K |
» |
C |
PH |
» |
FF |
PF |
» |
FF |
SCH |
» |
SSS |
|
|
2. |
Transcode last characters of name:
EE, IE |
» |
Y |
DT,RT,RD,NT,ND |
» |
D |
|
|
3. |
First character of key = first character of name. |
|
4. |
Transcode remaining characters by following these rules, incrementing by one character each time:
EV |
» |
AF |
else A,E,I,O,U » A |
Q |
» |
G |
|
Z |
» |
S |
|
M |
» |
N |
|
KN |
» |
N |
else K » C |
SCH |
» |
SSS |
|
PH |
» |
FF |
|
H |
» |
If previous or next is nonvowel, previous |
W |
» |
If previous is vowel, previous |
Add current to key if current != last key character |
|
5. |
If last character is S, remove it |
|
6. |
If last characters are AY, replace with Y |
|
7. |
If last character is A, remove it |
|
8. |
Collapse all strings of repeated characters |
|
9. |
Add original first character of name as first character of key |
|
|
Modified Algorithm: |
1. |
if the first character of the name is a vowel, remember it |
|
2. |
remove all 'S' and 'Z' chars from the end of the name |
|
3. |
transcode first characters of name
| |
4. |
Transcode trailing strings as follows,
IX |
» |
IC |
EX |
» |
EC |
YE,EE,IE |
» |
Y |
DT,RT,RD,NT,ND |
» |
D |
repeat this last step as necessary |
|
|
5. |
transcode 'EV' to 'EF' if not at start of name |
|
6. |
use first character of name as first character of key |
|
7. |
remove any 'W' that follows a vowel |
|
8. |
replace all vowels with 'A' and collapse all strings of repeated 'A' to one
|
|
9. |
transcode 'GHT' to 'GT' |
|
10. |
transcode 'DG' to 'G' |
|
11. |
transcode 'PH' to 'F' |
|
12. |
if not first character, eliminate all 'H' preceded or followed by a vowel |
|
13. |
change 'KN' to 'N', else 'K' to 'C' |
|
14. |
if not first character, change 'M' to 'N' |
|
15. |
if not first character, change 'Q' to 'G' |
|
16. |
transcode 'SH' to 'S' |
|
17. |
transcode 'SCH' to 'S' |
|
18. |
transcode 'YW' to 'Y' |
|
19. |
if not first or last character, change 'Y' to 'A' |
|
20. |
transcode 'WR' to 'R' |
|
21. |
if not first character, change 'Z' to 'S' |
|
22. |
transcode terminal 'AY' to 'Y' |
|
23. |
remove trailing vowels |
|
24. |
collapse all strings of repeated characters |
|
25. |
if first character of original name is a vowel, prepend to code (or replace first transcoded 'A') |
|
|
In both implementations, before the algoritm is applied, the input string is preprocessed as follows:
- Convert all characters to upper case
- Trim all trailing whitespace
- Remove "JR," "SR," and Roman Numerals from the end of the string (i.e., where "Roman Numerals" can be a malformed run of 'I' and 'V' chars)
- Remove all non-alpha characters
|
Click here to see a simple test page |
The original algorithm comes f rom Robert L. Taft, "Name Search Techniques", New York State Identification and Intelligence System. |
According to the document Duplicate Record Detection [PDF] by Elmagarmid, Ipeirotis, & Verykios, the resulting code is limited to six characters. |