Using Regex to give match quotes after delimiter

Asked

Viewed 73 times

0

I need to use regex to give match in all double quotes occurrences after the two dots (:) in a JSON output.

JSON:

{
    "data":  [
                 {
                     "ServerName":  "server.name",
                     "Installed":  204,
                     "Downloaded":  60,
                     "Failed":  0,
                     "PendingReboot":  0,
                     "NotInstalled":  112,
                     "Needed":  172,
                     "LastUpdated":  "sexta-feira, 23 de outubro de 2020 06:09:09"
                 },

The ideal would be to stay in this format:

{
    "data":  [
                 {
                     "ServerName":  "server.name",
...                                 ^           ^
...
                     "LastUpdated":  "sexta-feira, 23 de outubro de 2020 06:09:09"
                 },                  ^                                           ^

I tried to /\b(\")+(?!:)/g but it doesn’t work for me because it only gives match in the last quotation marks:

 "data":  [
      ^            {
                     "ServerName":  "server.name",
...                                             ^
...
                    "LastUpdated":  "sexta-feira, 23 de outubro de 2020 06:09:09"
                                                                                ^ 

How could I do?

  • 2

    What language are you using? There’s a reason you’re using regex instead of using some json parser?

  • I’m actually using script in Powershell,. I have to remove these quotes and could only think of -replace with Regex.

2 answers

2

Don’t use regex, use a parser json

I know the regex "worked" for your specific case, but it won’t always work because regex is not the best tool to work with JSON.

If you use what was proposed by the other reply (:\s+(\")) works for your specific case, but it’s enough that JSON has something like this:

{ "data": [ { "abc":  "xyz: " } ] }

That the regex will also take the last quotes (since she is after a : followed by one or more spaces), see.

"Ah, but in my case it doesn’t have that"

All right, but the idea of Stack Overflow is that the answers will be useful not only to those who asked, but to any future visitor, so I find it interesting to add to the subject, explaining why regex will not always be a good option.


This is because regex works with text in a "generic" way, without considering the context or analyzing its structure. Therefore, numerous problems can occur when trying to analyze a structured text, such as JSON and HTML (just to stay in 2 examples).

Of course you could increase the regex more and more to consider these cases. For example, moving to:

\"[^\"]+\"\s*:\s*(\")

There I consider that before the two-points have to have some content between quotation marks (\" indicates the quotation marks, and [^\"]+ is "one or more characters that nay are quotation marks"). I also changed \s+ (one or more spaces) per \s* (zero or more spaces), since "abc" : "xyz: " and "abc":"xyz: " are both valid. This resolves the above case, see. But will it solve all?


And if JSON has escaped quotes with \ inside a string? For example:

{ "data": [ "abc \"texto entre aspas\" : " ] }

The regex will consider that "texto entre aspas\" is a text between quotes (ie will give match in the stretch \"[^\"]+\" of the expression), then has a space, two-points, space and ", then regex will find the closing quotes (and note that this string is in an array, nor does it have the same structure as you wanted).

Then it gets more and more complicated to be able to treat all the possible cases. You would have to add this special case (inside quotes may have as well \"), and see how it gets more and more complicated. In the worst case, you may end up having to write one parser complete, which really isn’t worth it.

It’s easier to use one parser specific (most programming languages have some, either native or in external libraries), parse the types of data you want ("if it’s strings that are values of an object I do X", for example), change them and manipulate-them according to what you need and save the modified JSON. It’s much better and less error prone than using regex (and it’s not that complicated).

Although it is "cool" - I particularly like it a lot - and even "work" in some cases, regex is not always the best solution.

2


  • Thanks for the answer :) However, I would really need to pick up only the quotes after ":", without the remaining content.

  • I changed the answer, just put the match group in the first quotes in this case.

  • I made an adaptation with what you gave me and it worked for what I needed. Thanks!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.