Presents your SQL SERVER E-NEWSLETTER for September 19, 2002 <-------------------------------------------> FIND SIMILAR STRINGS USING THE SOUNDEX AND DIFFERENCE FUNCTIONS The SOUNDEX and DIFFERENCE functions work hand-in-hand to find similarity between two string values. You could use the SOUNDEX function in a table to compare names in a first name or a last name column, for example. In this scenario, the comparison will be with two four-character codes instead of variable length strings. The SOUNDEX function takes an input string value and creates the SOUNDEX four-character code. The input string value may be a constant, variable, or column. The first character of the SOUNDEX four-character code is the first character of the input string value of the function. Three numeric characters follow the first character in the SOUNDEX four-character code. For example, if the input string is John, then the first character of the SOUNDEX four-character code is J followed by three numeric characters. The DIFFERENCE function returns values zero through four, with four being the best match between two input string values. The DIFFERENCE function takes in two input string values and compares the second input string value, character by character, to the first input string value. The DIFFERENCE function's main use is to compare two SOUNDEX four-character alphanumeric codes for similarity. The following code demonstrates the use of SOUNDEX and DIFFERENCE functions. Please note that the DIFFERENCE function may not find a statistical significance in the relationship between two four-character SOUNDEX strings. As demonstrated below, the DIFFERENCE function is showing the same similarity between the two sets of four-character SOUNDEX codes when the second set of four-character SOUNDEX codes should be more similar. DECLARE @invar1 VARCHAR(20), @invar2 VARCHAR(20) SELECT @invar1 = 'marilyn', @invar2 = 'carolyn' SELECT SOUNDEX(@invar1), SOUNDEX(@invar2), DIFFERENCE(@invar1,@invar2) SELECT @invar1 = 'karilyn', @invar2 = 'carylin' SELECT SOUNDEX(@invar1), SOUNDEX(@invar2), DIFFERENCE(@invar1,@invar2) GO Here's the output from this code, showing both sets to be equally similar: ----- ----- ----------- M645 C645 3 ----- ----- ----------- K645 C645 3 The following code demonstrates the DIFFERENCE function working on strings other than the four-character SOUNDEX codes. DECLARE @invar1 CHAR(4), @invar2 CHAR(4) SELECT @invar1 = 'g120012', @invar2 = 'z3642O42120' SELECT DIFFERENCE( @invar1, @invar2 ) Here's the output: ----------- 3 ----------------------------------------