Normalize values separated by comma for new table

Asked

Viewed 853 times

6

The idea is to stop having the column with the values separated by comma and pass them to an intermediate table:

Source table

Assuming a table with the name press with the following fields:

id, tag_id

Containing type records:

┌───────────┬──────────────┐
│ press_id  │  tag_id      │
├───────────┼──────────────┤
│  1        │  1,2,3       │
├───────────┼──────────────┤
│  2        │  2,6,5       │
├───────────┼──────────────┤
│  3        │  10,450      │
└───────────┴──────────────┘

Table Destination

It is intended to perform a query to read them and write the values in a new table press_tags which will establish the relationship between the press and the tag:

┌──────┬────────────┬──────────────┐
│  id  |  press_id  │  tag_id      │
├──────┼────────────┼──────────────┤
│  1   │  1         │  1           │
├──────┼────────────┼──────────────┤
│  2   │  1         │  2           │
├──────┼────────────┼──────────────┤
│  3   │  1         │  3           │
├──────┼────────────┼──────────────┤
│  4   │  2         │  2           │
├──────┼────────────┼──────────────┤
│  5   │  2         │  6           │
├──────┼────────────┼──────────────┤
│  6   │  2         │  5           │
├──────┼────────────┼──────────────┤
│  7   │  3         │  10          │
├──────┼────────────┼──────────────┤
│  8   │  3         │  450         │
└──────┴────────────┴──────────────┘

Question

How to select all records from the source table and by each tag_id which exists separated by comma, insert an entry in the target table?

2 answers

6


Mysql has no function that allows us to split a string in multiple lines, so the work becomes a little complex:

SQL Fiddle

INSERT INTO press_tags (press_id, tag_id)
SELECT
    press.press_id,
    SUBSTRING_INDEX(SUBSTRING_INDEX(press.tag_id, ',', n.n), ',', -1) tag_id
FROM press
CROSS JOIN 
(
    SELECT a.N + b.N * 10 + 1 n
    FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a,
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(press.tag_id) - LENGTH(REPLACE(press.tag_id, ',', '')))
 ORDER BY press_id, tag_id

Explanation

  1. To sub-consultation with a pseudonym of n will generate in real time a sequence of numbers from 1 to 100, in this particular case, using [UNION ALL][4] and CROSS JOIN.

  2. In the SELECT experior, no SUBSTRING_INDEX() inside, let’s get everything down to the umpteenth element in a list.

    The SUBSTRING_INDEX() will extract the most direct portion after the last delimiter, thus being able to receive the information of the nth element.

  3. CROSS JOIN allows us to produce a set of lines that is a Cartesian product (100 lines in n, and all rows in the table press).

  4. The condition in the clause WHERE will filter all unnecessary lines from the result set.

This query will divide up to 100 tag_id for each entry in the source table. For the case at hand it is sufficient, but if necessary, you can adjust the sub-queries.


Common scene

A common scenario with values separated by a delimiter in a column is the insertion of the combination valor + delimitador, resulting in something like:

valor;valor;valor;

Where the presence of the last delimiter, with the above solution, will generate a blank entry in the target table:

┌──────┬────────────┬──────────────┐
│  id  |  press_id  │  tag_id      │
├──────┼────────────┼──────────────┤
│  1   │  1         │              │
├──────┼────────────┼──────────────┤
│  2   │  1         │  2           │
├──────┼────────────┼──────────────┤
│  3   │  1         │  3           │
└──────┴────────────┴──────────────┘

This problem can also be reflected in a type error:

Incorrect integer value: '' for column 'tag_id'

To avoid this, we can make use of another SELECT to select the fields where the tag_id not be '' or NULL:

INSERT INTO press_tags (press_id, tag_id)
SELECT
  result.press_id,
  result.tag_id
FROM (
  # consulta aqui
) AS result
WHERE result.tag_id > ''

Which results in:

SQL Fiddle

INSERT INTO press_tags (press_id, tag_id)
SELECT
  result.press_id,
  result.tag_id
FROM (
  SELECT
      press.press_id,
      SUBSTRING_INDEX(SUBSTRING_INDEX(press.tag_id, ',', n.n), ',', -1) tag_id
  FROM press
  CROSS JOIN 
  (
      SELECT a.N + b.N * 10 + 1 n
      FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a,
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
      ORDER BY n
  ) n
  WHERE n.n <= 1 + (LENGTH(press.tag_id) - LENGTH(REPLACE(press.tag_id, ',', '')))
  ORDER BY press_id, tag_id
) AS result
WHERE result.tag_id > ''

Much of this solution comes from the answers in SOEN in this question.

2

Look congratulations on your @Zuul reply from p/ see that you have a lot of experience in Mysql.

I can achieve the same result, but in a more humble way, I may even be considered a gambit, but it’s what I’ve always used as a solution for cases of this kind.

PREPARE DATA

  1. I make an sql that will generate a "script":

select concat('insert into press_tags (press_id,tag_id) select ',press_id,',tag_id from (select NULL tag_id union select ', replace(tag_id,',',' union select '),') A where tag_id IS NOT NULL;') 'query' from press;

This instruction will generate an output like this:

insert into press_tags (press_id,tag_id) select 1,tag_id from (select NULL tag_id union select 1 union select 2 union select 3) A where tag_id IS NOT NULL;
insert into press_tags (press_id,tag_id) select 2,tag_id from (select NULL tag_id union select 2 union select 6 union select 5) A where tag_id IS NOT NULL;
insert into press_tags (press_id,tag_id) select 3,tag_id from (select NULL tag_id union select 10 union select 450) A where tag_id IS NOT NULL;

SCRIPT

  1. I take this output and run it as sql statement:

mysql> insert into press_tags (press_id,tag_id) select 1,tag_id from (select NULL tag_id union select 1 union selec
t 2 union select 3) A where tag_id IS NOT NULL;
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> insert into press_tags (press_id,tag_id) select 2,tag_id from (select NULL tag_id union select 2 union selec
t 6 union select 5) A where tag_id IS NOT NULL;
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> insert into press_tags (press_id,tag_id) select 3,tag_id from (select NULL tag_id union select 10 union sele
ct 450) A where tag_id IS NOT NULL;
Query OK, 2 rows affected (0.03 sec)
Records: 2  Duplicates: 0  Warnings: 0

As I said it is a simpler way, it uses only more basic commands, but it is an alternative that works ;)

  • +1 This method is simple to implement and flexible for many scenarios.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.