Remove an undetermined number of " " in a column in the database

Asked

Viewed 1,639 times

8

The field description on the table product contains HTML generated by Ckeditor (English) that tries to escape the content:

<table cellpadding=\"\\&quot;\\\\&quot;\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;0\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\&quot;\\\\&quot;\\&quot;\" cellspacing=\"\\&quot;\\\\&quot;\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;0\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\&quot;\\\\&quot;\\&quot;\">
<thead>
    <tr>
        <td>
            <p>&nbsp;</p>
        </td>
    </tr>
</thead>
...

Problem

The problem here is that there was a series of copy/paste actions by the user, where Ckeditor eventually applied the escape character \ to the escape character \ and so forth giving rise to what can be seen above.
With each new copy the number of bars was increasing...

Given the number of \ be different in each place and in each record in the table, and given the amount of records, manual resolution of this subject is impracticable.

Question

How to apply a regular expression to a database update query by updating the field description with the value it contains, but cleaning all those escape bars, leaving only one where more than one is found?

Theoretical example:

UPDATE product
SET description = (description - milhões de barras)
WHERE 1

Not relevant, but the question of copying and pasting by cleaning previously applied escape characters is already solved on the PHP side. New records won’t cause this problem again. The issue seeks a Mysql solution to deal with old records already present in the database.

6 answers

3

Mysql does not support replacing regular expressions natively (one of the reasons I prefer Posgresql). So I recommend that you install the lib of Udfs mysql-udf-regexp. Once this is done, simply use regular expression substitution:

UPDATE product
SET description = PREG_REPLACE('\\\\+', '\\', description);

The regular expression \\\\+ box with any number of backslashes (first function parameter). How \ is escape character in Mysql and regular expression, it is necessary to use four \ to count as one in the regular expression. The second parameter is the substitution string, which in this case will be \. Finally, the last parameter is the column in which the operation will be performed.

2

You can run the following query several times until it no longer changes any record:

UPDATE product
SET description = REPLACE(description, '\\\\', '\\')
WHERE description LIKE '%\\\\\\\\%'

Mysql should indicate that no record was changed with a message like "Query OK, 0 Rows affected". You can also check that the overwrite has been completed when the following query does not return any record:

SELECT description FROM product
WHERE description LIKE '%\\\\\\\\%'

Explanation

The query changes each pair of inverted bars, \\, by a single, \. Thus, in a first execution of the consultation, \\\\ is replaced by \\, which in a second execution is replaced by \.

Note that in the expression REPLACE(description, '\\\\', '\\'), the backslash is used as an escape character; thus, '\\\\' represents two bars.

The expression description LIKE '%\\\\\\\\%', search columns containing two consecutive inverted bars, \\. This is because, in the expression LIKE, the inverted bars need to be "leaked" twice. According to the mysql documentation:

To search for " ", specify it as " "; this is because the backslashes are Stripped Once by the parser and Again when the Pattern match is made, Leaving a single backslash to be Matched Against.

0

You could maybe try a while in PHP or some other language with code similar to this:

WHILE (variable.contains('%\\%')
   variable.replace('\\','\');

0

where there are two or more bars, exchange for one:

$var = '\\&quot;\\\\&quot;\\\\\\\\&quot;\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\&quot;\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\';
echo preg_replace("/\\\+/",'\\',$var);
// retorna: \&quot;\&quot;\&quot;\&quot;\&quot;\

0

Since you want to "save" old results, then let’s give a update in the database.

UPDATE table
SET field = REPLACE(field, '\\\'', '\'')
WHERE field LIKE '%\\\''

0

Since this problem is quite simple, if you do not want or can not install an external library for substitutions with regular expressions, here is an adaptation of that function of Narcodes.

The only thing I did with the function was replace the fields VARCHAR(1000) for TEXT to handle strings of any size and remove the call to TRIM. Replace the fields VARCHAR for TEXT has a cost in terms of performance, but increases the flexibility of the solution:

DELIMITER //
CREATE FUNCTION `regex_replace`(pattern TEXT, replacement TEXT, original TEXT)

RETURNS TEXT
DETERMINISTIC
BEGIN
 DECLARE temp TEXT;
 DECLARE ch VARCHAR(1);
 DECLARE i INT;
 DECLARE j INT;
 DECLARE qbTemp TEXT;

 SET i = 1;
 SET j = 1;
 SET temp = '';
 SET qbTemp = '';

 IF original REGEXP pattern THEN
  loop_label: LOOP
   IF i>CHAR_LENGTH(original) THEN
    LEAVE loop_label; 
   END IF;
   SET ch = SUBSTRING(original,i,1);
   IF NOT ch REGEXP pattern THEN
    SET temp = CONCAT(temp,ch);
   ELSE
    SET temp = CONCAT(temp,replacement);
   END IF;
   SET i=i+1;
  END LOOP;
 ELSE
  SET temp = original;
 END IF;
 SET temp = REPLACE(REPLACE(REPLACE(temp , CONCAT(replacement,replacement),
   CONCAT(replacement,'#')),CONCAT('#',replacement),''),'#','');
 RETURN temp;
END// 

So we can use @André’s regular expression:

UPDATE product
SET description = regex_replace('\\\\+', '\\', description);

Example with SQL Fiddle

Browser other questions tagged

You are not signed in. Login or sign up in order to post.