Pages

Monday, October 29, 2012

Soundex in SQL ( Phonetic algorithm)

Introduction

phonetic algorithm matches two different words with similar pronunciation to the same code, which allows phonetic similarity based word set comparison and indexing.

Often it is quite difficult to find atypical name (or surname) in the database, for example:

— Hey, John, look for Adolf Schwarzenegger.
— Adolf Shwardseneger? There is no such person!
In this case, the use of phonetic algorithms can significantly simplify the problem.

Soundex algorithm is very useful for searching in lists of people in databases, as well as for using in a spell checker. They are often used in combination with the algorithms of fuzzy search, providing users with a handy search by name and surname in databases, lists of people and so on.

soundex sql soundex sql soundex sql soundex sql soundex in sql soundex in sql soundex in sql soundex in sql

Soundex

One of the first algorithms was Soundex invented in the 1910s by Robert Russell. This algorithm (its American version) matches words to the numerical index like A126. Its working principle is based on the partition of consonants in groups with ordinal numbers, which are then compiled to the resulting value. Later several improvements was suggested.


Soundex Coding Guide


Number Represents the Letters
1 B, F, P, V
2 C, G, J, K, Q, S, X, Z
3 D, T
4 L
5 M, N
6 R

Disregard the letters A, E, I, O, U, H, W, and Y. 


Every soundex code consists of a letter and three numbers, such as W-252. The letter is always the first letter of the surname. The numbers are assigned to the remaining letters of the surname according to the soundex guide shown below. Zeroes are added at the end if necessary to produce a four-character code. Additional letters are disregarded. 

Examples:
  • Washington is coded W-252 (W, 2 for the S, 5 for the N, 2 for the G, remaining letters disregarded).
  • Lee is coded L-000 (L, 000 added).
image  
Examples :
soundex sql server soundex sql server soundex code soundex code soundex code soundex code
F234 → Fusedale.
G535 → Genthner, Gentner, Gianettini, Gunton.
G640 → Garlee, Garley, Garwell, Garwill, Gerrell, Gerrill, Giral, Gorelli, Gorioli, Gourlay, Gourley, Gourlie, Graal, Grahl, Grayley, Grealey, Greally, Grealy, Grioli, Groll, Grolle, Guerola, Gurley.
H326 → Hadcroft, Hadgraft, Hatchard, Hatcher, Hatzar, Hedger, Hitscher, Hodcroft, Hutchcraft.
P630 → Parade, Pardew, Pardey, Pardi, Pardie, Pardoe, Pardue, Pardy, Parradye, Parratt, Parrett, Parrot, Parrott, Pearde, Peart, Peaurt, Peert, Perdue, Peret, Perett, Perot, Perott, Perotti, Perrat, Perrett, Perritt, Perrot, Perrott, Pert, Perutto, Pirdue, Pirdy, Pirot, Pirouet, Pirt, Porrett, Porritt, Port, Porte, Portt, Prate, Prati, Pratt, Pratte, Pratty, Preddy, Preedy, Preto, Pretti, Pretty, Prewett, Priddey, Priddie, Priddy, Pride, Pridie, Pritty, Prott, Proud, Prout, Pryde, Prydie, Purdey, Purdie, Purdy.


soundex number soundex number soundex number soundex number soundex sql server soundex sql server 

Soundex in SQL:

ALTER FUNCTION fn_soundex(@Getname VARCHAR(500))
RETURNS VARCHAR(500)
BEGIN

    -- To Store First Letter

    DECLARE @GetFirstLetter VARCHAR(20);    
    
    -- To Store Soundex Code
    DECLARE @SetSoundexCode VARCHAR(20);    
    
    -- To Find length of string
    DECLARE @Code_len INT;                    

    -- Default Value

    SET @SetSoundexCode = '0'                        
    
    -- Get First Letter
    SET @GetFirstLetter = substring(@Getname,1,1)    
    
    -- Copy Remain Character to another String
    SET @Getname = substring(@Getname,2,len(@Getname))    
    
    -- Assign Code according to Letter
    
    SET @Getname = replace(@Getname,'A',0)
    SET @Getname = replace(@Getname,'E',0)
    SET @Getname = replace(@Getname,'I',0)
    SET @Getname = replace(@Getname,'O',0)
    SET @Getname = replace(@Getname,'U',0)
    SET @Getname = replace(@Getname,'H',0)
    SET @Getname = replace(@Getname,'W',0)
    SET @Getname = replace(@Getname,'Y',0)
    
    SET @Getname = replace(@Getname,'B',1)
    SET @Getname = replace(@Getname,'F',1)
    SET @Getname = replace(@Getname,'P',1)
    SET @Getname = replace(@Getname,'V',1)

    SET @Getname = replace(@Getname,'C',2)

    SET @Getname = replace(@Getname,'G',2)
    SET @Getname = replace(@Getname,'J',2)
    SET @Getname = replace(@Getname,'K',2)
    SET @Getname = replace(@Getname,'Q',2)
    SET @Getname = replace(@Getname,'S',2)
    SET @Getname = replace(@Getname,'X',2)
    SET @Getname = replace(@Getname,'Z',2)

    SET @Getname = replace(@Getname,'D',3)

    SET @Getname = replace(@Getname,'T',3)
    
    SET @Getname = replace(@Getname,'L',4)
    
    SET @Getname = replace(@Getname,'M',5)
    SET @Getname = replace(@Getname,'N',5)
    
    SET @Getname = replace(@Getname,'R',6)
    
    -- Get Length of String
    SET @Code_len = len(@Getname)            
    
    -- Remove Space from both side
    SET @Getname = ltrim(rtrim(@Getname))    

    -- Check if length is greater then zero

    WHILE @Code_len > 0
    BEGIN
        -- Compare two letter at same time to check duplicate
        -- If character is repeating character then make it Zero...
        -- We will remove Zero at the End
        IF charindex(substring(@Getname,@Code_len,1),substring(@Getname,@Code_len-1,1)) > 0  
            SET @SetSoundexCode = @SetSoundexCode + '0'
        ELSE
            SET @SetSoundexCode = @SetSoundexCode + substring(@Getname,@Code_len,1)
        
        SET @Code_len = @Code_len - 1
    END

    SET @SetSoundexCode = @GetFirstLetter + reverse(replace(@SetSoundexCode,'0',''))+'000'


    RETURN LEFT(@SetSoundexCode,4)


END

GO


There are lot many variance in Soundex Search algorithm. 
Referenced from SOUNDEX (Transact-SOL)     soundex sql server soundex sql server
Useful Link:
http://www.archives.gov/research/census/soundex.html
soundex function soundex function soundex function soundex function sql soundex sql soundex sql soundex sql soundex

No comments:

Post a Comment