首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >PATINDEX与SOUNDEX

PATINDEX与SOUNDEX
EN

Stack Overflow用户
提问于 2018-11-28 12:01:35
回答 1查看 114关注 0票数 0

要使用PATINDEX和SOUNDEX搜索字符串。

下面的表中有一些示例数据,可以使用PATINDEXSOUNDEX搜索给定的字符串。

代码语言:javascript
复制
create table tbl_pat_soundex
(
    col_str varchar(max)
);

insert into tbl_pat_soundex values('Smith A Steve');
insert into tbl_pat_soundex values('Steve A Smyth');
insert into tbl_pat_soundex values('A Smeeth Stive');
insert into tbl_pat_soundex values('Steve Smith A');
insert into tbl_pat_soundex values('Smit Steve A');

字符串搜索:- 'Smith A Steve‘

代码语言:javascript
复制
SELECT col_str,PATINDEX('%Smith%',col_str) [Smith],PATINDEX('%A%',col_str) [A],PATINDEX('%Steve%',col_str) [Steve]
FROM tbl_pat_soundex

获得产出:

代码语言:javascript
复制
col_str         Smith   A   Steve
---------------------------------
Smith A Steve   1       7   9
Steve A Smyth   0       7   1
A Smeeth Stive  0       1   0
Steve Smith A   7       13  1
Smit Steve A    0       12  6

预期产出:

代码语言:javascript
复制
col_str         Smith   A   Steve
---------------------------------
Smith A Steve   1       7   9
Steve A Smyth   9       7   1
A Smeeth Stive  3       1   10
Steve Smith A   7       13  1
Smit Steve A    1       12  6

试过:

代码语言:javascript
复制
SELECT col_str,
        PATINDEX('%'+soundex('Smith')+'%',soundex(col_str)) [Smith],
        PATINDEX('%'+soundex('A')+'%',soundex(col_str)) [A],
        PATINDEX('%'+soundex('Steve')+'%',soundex(col_str)) [Steve]
FROM tbl_pat_soundex    

但却得到了意想不到的结果:

代码语言:javascript
复制
col_str         Smith   A   Steve
---------------------------------
Smith A Steve   1       0   0
Steve A Smyth   0       0   1
A Smeeth Stive  0       1   0
Steve Smith A   0       0   1
Smit Steve A    1       0   0   

注意事项:我在表中搜索记录的100 Millions

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-11-28 18:40:07

这里有一个选项,考虑到所有需要做的事情,不确定它将如何处理1亿条记录。你得检验一下。

在很高的层次上,我如何理解这是你基本上需要的

  • 根据另一个字符串的单词搜索字符串中的所有单词
  • 返回原始字符串中的字符起始位置,其中该单词等于或听起来与搜索单词类似。

您可以使用差()进行比较:

差异比较两个不同的SOUNDEX值,并返回一个整数值。此值度量SOUNDEX值在0到4时匹配的程度。值0表示SOUNDEX值之间的弱相似性或无相似性;4表示SOUNDEX值强相似,甚至一致匹配。

您将需要根据空格“”拆分字符串,而且由于您是2008年,您将不得不滚动您自己的函数。

我在这里使用了XML函数,https://sqlperformance.com/2012/07/t-sql-queries/split-strings,在我的示例中,如果您有自己的或者想使用不同的东西,显然需要进行调整:

代码语言:javascript
复制
CREATE FUNCTION dbo.SplitStrings_XML
(
   @List       NVARCHAR(MAX),
   @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
   RETURN 
   (  
      SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
      FROM 
      ( 
        SELECT x = CONVERT(XML, '<i>' 
          + REPLACE(@List, @Delimiter, '</i><i>') 
          + '</i>').query('.')
      ) AS a CROSS APPLY x.nodes('i') AS y(i)
   );
GO

我切换并使用表变量来显示示例,建议不要使用您拥有的数据量来创建和使用物理表。

备选案文1-非动态:

代码语言:javascript
复制
DECLARE @tbl_pat_soundex TABLE
    (
        [col_str] VARCHAR(MAX)
    );

INSERT INTO @tbl_pat_soundex
VALUES ( 'Smith A Steve' )
,( 'Steve A Smyth' )
,( 'A Smeeth Stive' )
,( 'Steve Smith A' )
,( 'Smit Steve A' )

SELECT DISTINCT [aa].[col_str]
              , MAX([aa].[Smith]) OVER ( PARTITION BY [aa].[col_str] ) AS [Smith]
              , MAX([aa].[A]) OVER ( PARTITION BY [aa].[col_str] ) AS [A]
              , MAX([aa].[Steve]) OVER ( PARTITION BY [aa].[col_str] ) AS [Steve]
FROM   (
           SELECT      [a].[col_str]
                     , CASE WHEN DIFFERENCE([b].[item], 'Smith') = 4 THEN
                                CHARINDEX([b].[item], [a].[col_str])
                            ELSE 0
                       END AS [Smith]
                     , CASE WHEN DIFFERENCE([b].[item], 'A') = 4 THEN
                                CHARINDEX([b].[item], [a].[col_str])
                            ELSE 0
                       END AS [A]
                     , CASE WHEN DIFFERENCE([b].[item], 'Steve') = 4 THEN
                                CHARINDEX([b].[item], [a].[col_str])
                            ELSE 0
                       END AS [Steve]
           FROM        @tbl_pat_soundex [a]
           CROSS APPLY [dbo].[SplitStrings_XML]([a].[col_str], ' ') [b]
       ) AS [aa];
  • 使用该函数,我们将字符串拆分成单独的单词。
  • 然后使用case语句检查差异值。
  • 如果差异值等于4,则返回原始单词对字符串的CHARINDEX值。
  • 如果不等于,我们返回0

然后,就需要根据原始字符串获得每个字符串的最大值:

代码语言:javascript
复制
          , MAX([aa].[Smith]) OVER ( PARTITION BY [aa].[col_str] ) AS [Smith]
          , MAX([aa].[A]) OVER ( PARTITION BY [aa].[col_str] ) AS [A]
          , MAX([aa].[Steve]) OVER ( PARTITION BY [aa].[col_str] ) AS [Steve]

给你最后的结果:

备选方案2-具有支点的动态:

我们将声明要搜索的字符串,将其拆分,并搜索原始字符串中的单个单词,然后将结果转到中心。

代码语言:javascript
复制
--This example is using global temp tables as it's showing how
--to build a dynamic pivot
IF OBJECT_ID('tempdb..##tbl_pat_soundex') IS NOT NULL
  DROP TABLE [##tbl_pat_soundex];

IF OBJECT_ID('tempdb..##tbl_col_str_SearchString') IS NOT NULL
  DROP TABLE [##tbl_col_str_SearchString];

CREATE TABLE [##tbl_pat_soundex]
    (
        [col_str] VARCHAR(MAX)
    );

INSERT INTO [##tbl_pat_soundex]
VALUES ( 'Smith A Steve' )
     , ( 'Steve A Smyth' )
     , ( 'A Smeeth Stive' )
     , ( 'Steve Smith A' )
     , ( 'Smit Steve A' );

--What are you searching for?
DECLARE @SearchString NVARCHAR(200);
SET @SearchString = N'Smith A Steve';

--We build a table we load with every combination of the words from the string and the words from the SearchString for easier comparison.
CREATE TABLE [##tbl_col_str_SearchString]
    (
        [col_str] NVARCHAR(MAX)
      , [col_str_value] NVARCHAR(MAX)
      , [SearchValue] NVARCHAR(200)
    );

--Load that table for comparison
--split our original string into individual words
--also split our search string into individual words and give me all combinations.
INSERT INTO [##tbl_col_str_SearchString] (
                                             [col_str]
                                           , [col_str_value]
                                           , [SearchValue]
                                         )
            SELECT      DISTINCT [a].[col_str]
                               , [b].[item]
                               , [c].[item]
            FROM        [##tbl_pat_soundex] [a]
            CROSS APPLY [dbo].[SplitStrings_XML]([a].[col_str], ' ') [b]
            CROSS APPLY [dbo].[SplitStrings_XML](@SearchString, ' ') [c]
            ORDER BY    [a].[col_str];

--Then we can easily compare each word and search word for those that match or sound alike using DIFFERNCE()
SELECT [col_str], [col_str_value], [SearchValue], CASE WHEN DIFFERENCE([col_str_value], [SearchValue]) = 4 THEN CHARINDEX([col_str_value], [col_str]) ELSE 0 END AS [Match] FROM ##tbl_col_str_SearchString

--Then we can pivot on it
--and we will need to make it dynamic since we are not sure what what @SearchString could be.
DECLARE @PivotSQL NVARCHAR(MAX);
DECLARE @pivotColumn NVARCHAR(MAX);

SET @pivotColumn = N'[' + REPLACE(@SearchString, ' ', '],[') + N']';

SET @PivotSQL = N'SELECT * FROM (
SELECT [col_str], [SearchValue], CASE WHEN DIFFERENCE([col_str_value], [SearchValue]) = 4 THEN CHARINDEX([col_str_value], [col_str]) ELSE 0 END AS [Match] FROM ##tbl_col_str_SearchString
) aa
PIVOT (MAX([Match]) FOR [SearchValue] IN (' + @pivotColumn
                + N')) AS MaxMatch
ORDER BY [MaxMatch].[col_str]
';

--Giving us the final results.
EXEC sp_executesql @PivotSQL
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/53519032

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档