Like I said before in the comments, If the JSON is coming with quotes exactly that way, then it is not a valid JSON. In that case, it is best to correct in the backend, so that it manages the JSON correctly - in this case, with the escaped quotes (\"
):
{
"descrição": "Eu faço trabalhos \"fáceis\" porém cansativos"
}
Treating the problem at source ensures that whoever is receiving this data will not need to worry about tidying it up, because it is not such a simple task.
As you said that the backend is PHP, make sure you are using the function correctly json_encode
(which is the simplest way to generate JSON in PHP). Or, if you are using another engine/API/framework, check that everything is correctly configured, if the parameters are correct, etc, because it is more likely that the problem is there.
The rest of the answer below is only to show how using regex can be a bad, more complicated and even unnecessary solution if you fix the problem at source.
See for example how a regex would look for your case (will not work on all browsers):
let s = `{
"descrição": "Eu faço trabalhos "fáceis" porém cansativos",
"teste": "Aqui não tem aspas a mais",
"teste2": "Aqui " tem várias " aspas a mais""
}`;
let r = /(?<!"\s*:\s*)"(?![\n\r,:]|[^"]+":)/g;
console.log(s.replace(r, '"'));
Basically, it picks up the quotes, taking into account various factors:
The Negative lookbehind (?<!"\s*:\s*)
checks if something nay before the quotation marks. In this case, we have a quotation mark, followed by \s*
(zero or more spaces), two dots, zero or more spaces. That is, it cannot be the first quote just after the :
.
Obs: the lookbehind currently only works in Chrome. But even if you use another language - other than Javascript - that supports this feature, still worth reading the rest of the answer.
The Negative lookeahead (?![\n\r,:]|[^"]+":)
checks if something nay exists after quotation marks. In this case, it is [\n\r,:]
(a line break, or comma, or two points). Thus, I do not consider the closing quotes. But the |
(which means or) admits another possibility: [^"]+:
- one or more characters other than quotation marks, followed by two dots (without this, the regex would also take the first quotation marks of each line).
Basically, all these rules are to disregard legitimate opening and closing quotes. But this regex does not cover all cases.
For example, if we have an array, it no longer works:
let s = `{
"lista": [ "Eu faço trabalhos "fáceis" porém cansativos" ]
}`;
let r = /(?<!"\s*:\s*)"(?![\n\r,:]|[^"]+":)/g;
console.log(s.replace(r, '"'));
In this case, it replaces all quotes within the array. Then we need to put more conditions in regex to indicate the new condition. For example, I could indicate that you should ignore the quotes right after the [
or just before the ]
:
let s = `{
"lista": [ "Eu faço trabalhos "fáceis" porém cansativos" ]
}`;
let r = /(?<!"\s*:\s*|[\[,]\s*)"(?![\n\r,:]|[^"]+":|\s*\])/g;
console.log(s.replace(r, '"'));
But there are still cases where it might fail. For example, if within the string we have some of these characters, such as :
or []
:
let s = `{
"fad": "fasdfa "fa" : "fasdfasd"sdfa",
"xyz": [ "af [ "xyz " ad", " fasd "fasf" dfs"]
}`;
let r = /(?<!"\s*:\s*|[\[,]\s*)"(?![\n\r,:]|[^"]+":|\s*\])/g;
console.log(s.replace(r, '"'));
Now, as inside the strings I have the characters :
and []
(that I had used as the reference points to know if I’m at the beginning or end of a string), the regex gets lost because it’s not checking whether it should be inside a string or not.
I even believe that it is possible to continue and include this amendment, but I think the regex is already complicated enough and not worth it anymore.
All this is to show that it might not be worth trying to fix JSON with regex. Try to fix the JSON where it is generated instead of create a bigger problem while trying to solve it with regex.
If JSON is quoting exactly like this, then the error is in the backend, which is generating an invalid JSON, and that’s where it should be fixed. Trying to fix with regex is not simple (do not use regex to manipulate JSON - to a valid it’s hard enough, for an invalid, it’s even worse). Probably you will have to manually manipulate the string, since usually parsers give error when JSON is invalid (or fix where it is generated, which is the most indicated)
– hkotsubo
Saul, it would be interesting for you to add in the question the code that generates the JSON, or at least inform which language you are using to generate this JSON.
– fernandosavio