Convert rows to columns (Pivot?)

Asked

Viewed 830 times

9

I have a chart with 33 million phone records with the structure below:

ccpf_cnpj   ddd telefone    tipo    origem
11111111111 83  81021111    M   SERASA
11111111111 83  87472222    M   SERASA
11111111111 83  81023333    M   TRANSUNION
11111111111 83  88724444    M   TRANSUNION
11111111111 83  87475555    M   TRANSUNION
22222222222 43  36271111    F   SERASA
22222222222 44  36272222    F   SERASA
22222222222 43  36273333    F   TRANSUNION
22222222222 43  36284444    F   TRANSUNION
33333333333 51  51811111    F   SERASA
33333333333 51  56212222    F   SERASA
33333333333 51  96213333    M   SERASA

I need to "denormalize" this table so that each CPF has a single record, it should look like this:

CCPF_CNPJ   DDD_1   TELEFONE_1  TIPO_1  ORIGEM_1    DDD_2   TELEFONE_2  TIPO_2  ORIGEM_2    DDD_3   TELEFONE_3  TIPO_3  ORIGEM_3    DDD_4   TELEFONE_4  TIPO_4  ORIGEM_4    DDD_5   TELEFONE_5  TIPO_5  ORIGEM_5
11111111111 83  81021111    M   SERASA  83  87472222    M   SERASA  83  81023333    M   TRANSUNION  83  88724444    M   TRANSUNION  83  87475555    M   TRANSUNION
22222222222 43  36271111    F   SERASA  44  36272222    F   SERASA  43  36273333    F   TRANSUNION  43  36274444    F   TRANSUNION  NULL    NULL    NULL    NULL
33333333333 51  51811111    F   SERASA  51  56212222    F   SERASA  51  96213333    M   SERASA  NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL

A colleague here suggested using a Pivot + Unpivot, but I could not adapt to the need, or maybe I do not know how to use Pivot for this case, usually when I use a field that defines me columns, which is not the case.

I was able to do it in a sampling (100 records) using joins with the table itself, but the solution became unviable due to the volume of data (remembering: 33 million records).

Someone’s been through something similar?

The SQL Server version here is 2014.

Thank you in advance.

  • @Gypsy Morrison Mendez I managed to execute the code correcting two details, but while running me returned the message 'An error occurred while executing batch. Error message is: Exception of type 'System.Outofmemoryexception' was thrown. ' after just over 5 minutes of execution. I think the volume of data is really impacting. I will try to "break" the table into smaller parts to see if I can perform. Thanks for the help.

  • I would do for CNPJ tracks. You tried to edit my answer, right? With the change the answer works for you?

  • 1

    @Gypsy Morrison Mendez Thank you so much for your help! I was working to divide the table into smaller parts, but they answered my question in Stack overflow in English, I tested the solution and it worked (you’re right, he used tracks, but it was not necessary to use Pivot): http://stackoverflow.com/questions/35340101/rows-to-Columns-pivot

3 answers

2

I adapted this answer for your need. Use not only PIVOT, but also dynamic query construction because the number of columns will be undefined:

DECLARE 
  @MaxCount INT, 
  @coluna_ddd CHAR(3) = 'ddd', @ddd NVARCHAR(MAX), 
  @coluna_telefone CHAR(8) = 'telefone', @telefone NVARCHAR(MAX), 
  @coluna_tipo CHAR(4) = 'tipo', @tipo NVARCHAR(MAX), 
  @coluna_origem CHAR(6) = 'origem', @origem NVARCHAR(MAX), 
  @colunas_dinamicas VARCHAR(MAX), @sentenca_pivot VARCHAR(MAX),
  @i INT, @j NVARCHAR(10)

SELECT @MaxCount = MAX(r), @i = 1 FROM (SELECT CCPF_CNPJ, row_number() OVER (PARTITION BY CCPF_CNPJORDER BY CCPF_CNPJ) r FROM Tabela) T
WHILE @i <= @MaxCount
    SELECT @j = cast(@i AS NVARCHAR(10)), 
           @ddd = COALESCE(@ddd + ', ', '') + '['+@coluna_ddd + '_' + @j + ']', 
           @telefone = COALESCE(@telefone + ', ', '') + '[' + @coluna_telefone + '_' + @j + ']', 
           @tipo = COALESCE(@tipo + ', ', '') + '[' + @tipo + '_' + @j + ']', 
           @origem = COALESCE(@origem + ', ', '') + '[' + @origem + '_' + @j + ']', 
           @colunas_dinamicas = COALESCE(@colunas_dinamicas + ', ', '') + '[' + @coluna_ddd + '_' + @j + '], [' + @coluna_telefone + '_' + @j + '], [' + @coluna_tipo + '_' + @j + '], [' + @coluna_origem + '_' + @j + ']', 
           @i = @i + 1

SELECT 
    @sentenca_pivot = 'SELECT * FROM (SELECT CCPF_CNPJ, ''@2_'' + CAST(row_number() OVER (PARTITION BY CCPF_CNPJ ORDER BY CCPF_CNPJ) AS VARCHAR) A, @2 FROM Tabela) T PIVOT (MAX(@2) FOR A IN (@1)) PVT',
    @colunas_dinamicas = 'SELECT A1.CCPF_CNPJ, ' + @colunas_dinamicas +' FROM (' + REPLACE(REPLACE(@sentenca_pivot, '@1', @ddd), '@2', @coluna_ddd)+') A1
    LEFT OUTER JOIN ('+REPLACE(REPLACE(@sentenca_pivot, '@1', @telefone), '@2', @coluna_telefone)+') A2 ON A1.CCPF_CNPJ = A2.CCPF_CNPJ
    LEFT OUTER JOIN ('+REPLACE(REPLACE(@sentenca_pivot, '@1', @tipo), '@2', @coluna_tipo)+') A3 ON A1.CCPF_CNPJ = A3.CCPF_CNPJ
    LEFT OUTER JOIN ('+REPLACE(REPLACE(@sentenca_pivot, '@1', @origem), '@2', @coluna_origem)+') A4 ON A1.CCPF_CNPJ = A4.CCPF_CNPJ'

EXEC(@colunas_dinamicas)
  • @Cassioveras the edition you made changes, even if a small detail, the response of Gypsy Morrison. For these cases, it is recommended that you warn via comments and ask the author to make the change, as edits cannot change the meaning of the answer/question, or add information that causes this.

  • @Diego Felipe Ok. The code he put up doesn’t work as it is published, it’s really just two details. I’m new here, I’ll comment on his response then. Thank you.

1

SELECT Campo1,Exemplo1,Exemplo2,Exemplo3
FROM
(
select Campo1,Campo2,'Exemplo' +convert(varchar(10),ROW_NUMBER () over (partition by Campo1 Order By Campo2)) as Sequencia from Tabela1

)Apelido1
PIVOT
(
MAX(Campo2)
for Sequencia in (Exemplo1,Exemplo2,Exemplo3)
)Apelido2

0

Look, this thing works. In this case I’m assuming that the number that has the highest number phone will stay, if you have another column that makes more sense just change:

FILTRO = ROW_NUMBER() OVER(PARTITION BY CCPF_CNPJ order by **TELEFONE** desc),  

for:

FILTRO = ROW_NUMBER() OVER(PARTITION BY CCPF_CNPJ order by **ID** desc),  

or

FILTRO = ROW_NUMBER() OVER(PARTITION BY CCPF_CNPJ order by **DATA** desc),  

etc....

WITH DADOS AS(  
 SELECT   
     FILTRO = ROW_NUMBER() OVER(PARTITION BY CCPF_CNPJ order by TELEFONE desc),   
     CCPF_CNPJ,   
     DDD,   
     TELEFONE,   
     TIPO,   
     ORIGEM  
 FROM TABELA
)  
SELECT * FROM DADOS
WHERE FILTRO = 1  

Browser other questions tagged

You are not signed in. Login or sign up in order to post.