Split string into substrings and search for them in another

Asked

Viewed 1,404 times

4

I’m creating a function in which I will check whether a endereço_A contains parts in a endereço_B, exemplifying:

address_A

AVENIDA JOÃO E MARIA

address_B

RUA JOÃO

The result should return JOÃO, which is the substring that both addresses have in common.

CREATE FUNCTION dbo.splitSearchAddress ( @addressSearch VARCHAR(MAX), @addressIn VARCHAR(MAX) )
RETURNS
    @returnList TABLE ([Name] [nvarchar] (500))
AS
BEGIN

    DECLARE @name NVARCHAR(255)
    DECLARE @pos INT
    DECLARE @separator CHAR(1)

    SET @separator = ' '

    WHILE CHARINDEX(@separator, @addressSearch) > 0
    BEGIN
        SELECT @pos  = CHARINDEX(@separator, @addressSearch)  
        SELECT @name = SUBSTRING(@addressSearch, 1, @pos-1)

        IF CHARINDEX(@name, @addressIn) > 0
        BEGIN
            INSERT INTO @returnList 
            SELECT @name

        END

        SELECT @addressSearch = SUBSTRING(@addressSearch, @pos+1, LEN(@addressSearch)-@pos)

    END

    INSERT INTO @returnList
    SELECT ISNULL(@addressSearch, 0)

    RETURN
END

The problem is when I look for something that does not contain in endereço_B and it returns me the last value even if it is not found,. What would be the ideal way to solve this? It will be necessary to keep redefining the variable for each same condition?

Example of the Problem:

SELECT * FROM dbo.splitSearchAddress('RUA JOÃO E MARIA', 'AVENIDA JOÃO')

He will return to me

JOÃO
E
MARIA

When I should return only

JOÃO

If the substring of endereço_A is the last position and is located in the endereço_B, works properly:

SELECT * FROM dbo.splitSearchAddress('RUA MARIA JOÃO', 'AVENIDA JOÃO')

2 answers

3


The CHARINDEX does not work for your problem, because it looks for occurrences in String. Words like "Rua Da", "RUA A" would fall into the same situation as the word "E".

When "E" is assigned there is variable @name the IF CHARINDEX('E', 'AVENIDA JOÃO') is returned 3 due to occurrence of "E" in "AVENUE".

One solution I found was to break the text "RUA JOÃO E MARIA" and "AVENIDA JOÃO" into separate tables using a function Split(' ') with space delimiter, after that you can compare what is common using a INNER JOIN. My code went like this.

Funcão Split :

CREATE FUNCTION [dbo].[Split] ( @text VARCHAR(MAX),
                                      @delimiter VARCHAR(20) = ' ' ) RETURNS @Strings TABLE ( [position] INT IDENTITY PRIMARY KEY,
                                                                                                                              [value] VARCHAR(100) ) AS BEGIN DECLARE @index INT
SET @index = -1 WHILE (LEN(@text) > 0) BEGIN -- Find the first delimiter

SET @index = CHARINDEX(@delimiter, @text) IF (@index = 0)
AND (LEN(@text) > 0) BEGIN
INSERT INTO @Strings
VALUES (CAST(@text AS VARCHAR(100))) BREAK END

                IF (@index > 1) BEGIN
INSERT INTO @Strings
VALUES (CAST(LEFT(@text, @index - 1) AS VARCHAR(100)))
SET @text = RIGHT(@text, (LEN(@text) - @index)) END --Delimiter is 1st position = no @text to insert
 ELSE
SET @text = CAST(RIGHT(@text, (LEN(@text) - @index)) AS VARCHAR(100)) END RETURN END

SplitSearchAddress function:(adapted)

CREATE FUNCTION dbo.splitSearchAddress (@addressSearch VARCHAR(MAX), @addressIn VARCHAR(MAX)) RETURNS @returnList TABLE ([Name] [nvarchar] (500)) AS BEGIN DECLARE @Temp1 TABLE (NAME varchar(MAX)) DECLARE @Temp2 TABLE (NAME varchar(MAX))
INSERT INTO @Temp1
SELECT value
FROM dbo.Split(@addressSearch, ' ');


INSERT INTO @Temp2
SELECT value
FROM dbo.Split(@addressIn, ' ');


INSERT INTO @returnList
SELECT T1.NAME
FROM @Temp1 T1
    INNER JOIN @Temp2 T2 ON T1.NAME = T2.NAME RETURN 
END

Query:

SELECT * FROM dbo.splitSearchAddress('RUA JOÃO E MARIA', 'AVENIDA JOÃO')

Upshot:

inserir a descrição da imagem aqui

1

Based on a function to separate words by space:

if object_id('fc_split', 'TF') is null
begin
  exec('create function fc_split() returns @resultado table(item varchar(max)) as begin return end');
end
go

alter function dbo.fc_split(@frase       varchar(max),
                            @delimitador varchar(max))
returns @resultado table(item      varchar(max),
                      sequencia integer)
begin
  declare @parte     varchar(max),
          @sequencia integer;

  set @sequencia = 0;

  while charindex(@delimitador, @frase, 0) <> 0
  begin
    set @parte = substring(@frase, 1, charindex(@delimitador, @frase, 0) - 1);
    set @frase = substring(@frase, charindex(@delimitador, @frase, 0) + len(replace(@delimitador, ' ', '.')), len(@frase));

    if len(@parte) > 0
    begin
      insert into @resultado(item, sequencia) values(@parte, @sequencia + 1);

      set @sequencia = @sequencia + 1;
    end;
  end;

  if len(@frase) > 0
  begin
    insert into @resultado(item, sequencia) values(@frase, @sequencia+ 1);
  end

  return;
end
go

You only need to make one INTERSECT with the results of dividing the two sentences, resulting in the correction of their function:

if object_id('splitSearchAddress', 'TF') is null
begin
  exec('create function splitSearchAddress() returns @resultado table(item varchar(max)) as begin return end');
end
go

alter function dbo.splitSearchAddress(@addressSearch varchar(max),
                                      @addressIn     varchar(max))
returns @returnList table(name nvarchar (500))
as
begin
  insert into @returnList(name)
  select item
    from fc_split(@addressSearch, ' ')
  intersect
  select item
    from fc_split(@addressIn, ' ');

  return
end

INTERSECT operator

The operator INTERSECT is used to return the result of 2 or more SELECT. On entering will only return the lines that are selected by all queries or data sets. If a record exists in a query and not in another, it will be omitted from the result.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.