How to select the logUser attribute only on the main JSON object using Regex

Asked

Viewed 787 times

2

I need to select the attribute logUser only in the JSON main object if it comes with value or null.

Ex: logUser: 100 | logUser: null

{
    "id": 1,
    "numeracao": "001",
    "logUser": 100,
    "permissionario": {
        "id": 3113715,
        "nome": "ARTHUR MATHEUS SÉRGIO DA SILVA",
        "dataDeNascimento": "2000-12-10",
        "endLogradouro": "RUA TERESINA",
        "endNumero": "89465468",
        "endBairro": "ROSA DOS VENTOS",
        "endCidade": "PARNAMIRIM",
        "endUf": "RN",
        "endCep": "59142125",
        "endComplemento": null,
        "telefoneFixo": null,
        "telefoneMovel": "(94) 949849849",
        "email": null,
        "logDate": "2019-01-16",
        "logUser": null,
        "sexo": "MASCULINO",
        "foto": null,
        "cpf": "10322314593",
        "estadoCivil": "CASADO",
        "cnhNumero": null,
        "cnhCategoria": null,
        "cnhValidade": null,
        "rgNumero": "64684654",
        "rgOrgaoExpeditor": "SSP",
        "rgDataEmissao": null,
        "rgUF": "RN",
        "status": true
    },
    "veiculo": {    
        "id": 3,
        "placa": "JUQ9196",
        "marca": "RENAULT",
        "modelo": "SC",
        "renavam": "54840252144",
        "anoDeFabricacao": "2006",
        "corPredominante": "BRANCO",
        "logCidadao": null,
        "status": true,
        "logUser": null,
        "logDate": "2019-01-16T11:05:10.425",
        "version": 1
    },
    "logDate": "2019-01-16T11:05:13.264",
    "status": true,
    "version": 0,
    "motorista": {
        "id": 3113717,
        "nome": "DAVI LEVI GALVÃO",
        "dataDeNascimento": "1996-10-20",
        "endLogradouro": "RUA FRANCISCO FERREIRA DA SILVA",
        "endNumero": "911",
        "endBairro": "VALE DO SOL",
        "endCidade": "PARNAMIRIM",
        "endUf": "RN",
        "endCep": "59143025",
        "endComplemento": null,
        "telefoneFixo": "8429884472",
        "telefoneMovel": "84995302167",
        "email": "[email protected]",
        "logDate": "2019-01-16",
        "logUser": null,
        "sexo": "MASCULINO",
        "foto": null,
        "cpf": "98842966428",
        "estadoCivil": "CASADO",
        "cnhNumero": null,
        "cnhCategoria": null,
        "cnhValidade": null,
        "rgNumero": "184962122",
        "rgOrgaoExpeditor": "SSP",
        "rgDataEmissao": null,
        "rgUF": "RN",
        "status": true
    }
}
  • 1

    It must necessarily be with regex?

  • Most languages (if not all) have libs ready to read and manipulate JSON, so use the appropriate tool for each task. Even manipulating the JSON object with the proper lib will be much easier than using a regex, and less error-prone as well - for example, a regex does not check if the JSON is well formed, if there are syntax errors, etc (and check this with regex it’s so complicated it’s not worth it). Regex is nice, but is not always the best solution

  • It doesn’t necessarily have to be regex. I think you’ve misexpressed the question. This JSON is intercepted in a filter in the middle of the request, I treat it as a String in order to add the attribute logUser if it is not contained in the Object or changes it if it comes. I opted for regex to avoid running through the string in search of the attribute, but I came across this problem.

2 answers

3


Don’t use regex for that

You may think regex is a good idea for this case, but believe me, it’s not.

You probably thought it would be something as simple as "logUser"\s*\:([^\,]+), (the word logUser between quotation marks, whether or not followed by spaces, followed by :, followed by several characters that are not commas, followed by comma). And apparently "it works".

The problem is that this regex does not validate if the entire string is actually a JSON, which is already a major drawback. You don’t just want to check if there is a line whose characters are in a certain format. You want to check whether the data is actually a valid JSON (which is a specific type of data, with a well-defined format and types), and whether this JSON has a certain key. And this the regex does not.

It is even possible to validate a JSON using regex, but honestly, it’s so complicated that in my opinion is not worth it (maybe only as curiosity and/or to delve into the regex syntax).

Also, the regex I suggested is too naive. First, it accepts invalid Jsons as:

{ "logUser":::::::123, "etc" ... }
{ "logUser":  , }

This because of the [^,], which means "any character other than a comma", that is, if it has several spaces or several :, a regex considers valid. We can create more complicated expressions (such as link already quoted), which correctly validate strings, numbers, etc, but still there are other problems.

For example, you only want the "logUser" of the first level. But regex cannot check which level the key is on:

{
    "abc": 123,
    "logUser": 123, <-- deveria pegar só esse
    "segundoNivel": {
        "logUser": 123, <-- não deveria pegar esse, pois não está no primeiro nível (mas pega)
    },
    ...
}

Of course, I could make regex only take the first occurrence of "logUser", but JSON could also be like this:

{
    "abc": 123,
    "segundoNivel": {
        "logUser": 123, <-- não deveria pegar esse, pois não está no primeiro nível (mas pega)
    },
    "logUser": 123, <-- deveria pegar só esse
    ...
}

Remember that a JSON object is a unordered set (ensemble unwarranted) of key/value pairs, which means that the order of the keys is not guaranteed, and depending on the library/language used, it may be that "logUser" appears as above. And the regex will not save you in these cases (unless you are Aventure with recursive regex, that besides complicated, not all languages support).

Of course you could check if the line starts with 4 spaces (a way to "guarantee" that it is in the first level), but then you would have to ensure that the JSON will always be formatted this way (and this guarantee has to be made outside the regex). But since you are going to format JSON - preferably using some library - why not take advantage and use this library to validate and check/get the field you need?

Finally, at this link you can see the regex "working" for these cases. Of course you can improve it by changing the comma by [,}] ("comma or }", because "logUser" may be the last element, there will be no comma, and yes }), or putting something like "\w+"|\d+ (quotation marks or numbers) to further restrict values, etc. But in the end you will end up with gigantic expressions like the of this link, which in my opinion makes it not worth using regex.


Use the right tool

Regex is a powerful and very cool tool, but is not always the best solution.

If your data is a JSON object, use a dedicated library. You did not specify which language you are using, but the vast majority of them (if not all) have some library to handle/read/validate/convert JSON.

Then just read the data and check if there is the key "logUser" in the first level (and get its value easily to compare it with what you want, etc), without having to worry about all the problems mentioned above, and without having to build a super-complicated regex (not only to do, but to keep in the future as well).

Using the wrong tool will only - unnecessarily - bring you problems that you wouldn’t have if you used the right tool.


Just a few examples for you to see how much simpler it is not to use regex. Since you did not specify which language you are using, I chose some "randomly". JSON has been simplified to not leave the code too long, but they all work for your JSON as well.

In Python, just use the module json:

import json

def test(texto):
    # transforma o texto em um objeto JSON
    json_obj = json.loads(texto)
    # verifica se possui logUser no primeiro nível
    if 'logUser' in json_obj:
        print('logUser está no primeiro nível, valor=', json_obj['logUser'])
    else:
        print('logUser não está no primeiro nível')

# logUser no primeiro nível
test("""
    {
      "abc": "cde",
      "logUser": "xyz",
      "segundoNivel": {
        "etc": 123
      }
    }""");

# logUser no segundo nível
test("""
    {
      "abc": "cde",
      "segundoNivel": {
        "logUser": "xyz",
        "etc": 123
      }
    }""")

In Java, there are several libraries available, such as bundle org.json, the Gson from Google, etc. They all work very similarly (they receive a string and convert to a JSON object, which can be used to check if the key exists and get its value). An example with org.json.JSONObject:

public void verifica(String texto) {
    // transforma a string em um objeto JSON
    JSONObject obj = new JSONObject(texto);
    // verifica se possui logUser no primeiro nível
    if (obj.has("logUser")) {
        System.out.println("possui logUser no primeiro nível, valor=" + obj.getString("logUser"));
    } else {
        System.out.println("não possui logUser no primeiro nível");
    }
}

// possui logUser no primeiro nível
verifica("{ \"abc\": \"cde\", \"logUser\": \"xyz\", \"segundoNivel\": { \"etc\": 123 } }");
// não possui logUser no primeiro nível
verifica("{ \"abc\": \"cde\", \"segundoNivel\": { \"logUser\": \"xyz\", \"etc\": 123 } }");

In PHP, you can use json_decode:

function verifica($texto) {
    // converte o texto para objeto JSON
    $json = json_decode($texto);
    // verifica se logUser está no primeiro nível
    if (array_key_exists('logUser', $json)) {
        echo "\nlogUser está no primeiro nível, valor=". $json->logUser;
    } else{
        echo "\nlogUser não está no primeiro nível";
    }
}

// logUser no primeiro nível
verifica('{ "abc": "cde", "logUser": "xyz", "segundoNivel": { "etc": 123 } }');

// logUser no segundo nível
verifica('{ "abc": "cde", "segundoNivel": { "logUser": "xyz", "etc": 123 } }');

Like you said you want to check if logUser exists, even if the value is null, I used array_key_exists, that checks if the key exists, regardless of the value. If you want to discard null values, can exchange for isset.

Finally, in Javascript, use the object JSON:

function verifica(elemento){
  // pega o texto do elemento
  let texto = document.querySelector(elemento).value;

  // transforma o texto em um objeto JSON
  let json = JSON.parse(texto);

  // verifica se possui logUser no primeiro nível
  if (json.logUser) {
    console.log(`${elemento} possui logUser no primeiro nível, valor=${json.logUser}`);
  } else {
    console.log(`${elemento} não possui logUser no primeiro nível`);
  }
}

verifica('#logUserPrimeiroNivel');
verifica('#logUserSegundoNivel'); 
<textarea id="logUserPrimeiroNivel" rows="8">
{
  "abc": "cde",
  "logUser": "xyz",
  "segundoNivel": {
    "etc": 123
  }
}
</textarea>

<textarea id="logUserSegundoNivel" rows="8">
{
  "abc": "cde",
  "segundoNivel": {
    "logUser": "xyz",
    "etc": 123
  }
}
</textarea>

Anyway, most languages have some library to handle JSON, and they all usually work in a similar way (turn the string into JSON object, check if the key exists, take its value). And notice in the examples above that the code is very simple, much easier to understand and maintain than that:

$pcre_regex = '
  /
  (?(DEFINE)
     (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )    
     (?<boolean>   true | false | null )
     (?<string>    " ([^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
     (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
     (?<pair>      \s* (?&string) \s* : (?&json)  )
     (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
     (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
  )
  \A (?&json) \Z
  /six   
';

And this is only to check if the string is a valid JSON. You still need to modify it to fit the logUser only in the first level. Remembering that the above regex makes use of recursion, which not supported by all languages.

Definitely, regex nay is the best option for your case.

  • 1

    Thank you so much for the great explanation. Regex really wasn’t the best way out. I was able to solve it with a lib of the language (JAVA) I was using.

1

"- I need to select the logUser attribute only in the JSON Main Object"

Possible but not recommended!

@off-topic: There are libraries of JSON for almost all (if not all) major programming languages. It is much better, and more secure, to return the value of such direct property with the JSON.

I will not extend much on, since the reply of @hkotsubo already explains precisely the advantages of using a JSON library and the disadvantages of using regular expression.

But if you still want to use regular expression, you can do so:

/\"logUser\"(?=.*\:)([\s\:]+)([^\,}]+?)(\,|\})/

Explaining:

  1. Seeking "logUser";
  2. Space search (\s) and two points (:) once or more, and the Positive Lookahead force to have two points (:);
  3. Seeking anything other than a comma (,) or lock (}), which is the value you need;
  4. Search the value delimiter, which can be a comma (,) or lock (});

See this functional example.

Note that I have not used the modifier global, since you only want the main object. The value you need is captured in the second group: $2.

Now just you capture the "match" in your project. I won’t illustrate, since you didn’t mention which language you are using, etc...

I must reiterate that the best solution is to use a JSON library!

But if you really want to use regular expression, make sure that this JSON will always be valid and follow such a structure.

If you always have that indentation, can add a search of 4 spaces (\s) at the beginning of the line to make sure it is even on the "first level" of JSON. See this link.

I am duck in regular expression. I formulated this response in order to exercise and "demonstrate possibilities". Consider following the recommendations of mentioned answer, instead of "reinventing the wheel"...

Special thanks for the @hkotsubo.

  • Use [\s\:]+ makes it accept only spaces (and no :) between the key and the value, which is invalid in JSON. There are other problems too, see here - in my answer I used a similar regex, but in the end I concluded that it has more problems than advantages. And making a more accurate regex is so complicated that it’s not worth it. It’s even better not to use regex, but a JSON lib :-)

  • 1

    Fair. As I mentioned in off-topic, lib is literally the best option... So, remember, I implement this expression by following its placement. I do not do now because I am not near the PC. And mobile does not roll! Kkkk

  • 1

    @hkotsubo didn’t have much to do. His answer has covered almost everything. But still, I reworked by correcting some things and emphasizing the issue of the library... Plus, thank you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.