Don’t use regex for that
You may think regex is a good idea for this case, but believe me, it’s not.
You probably thought it would be something as simple as "logUser"\s*\:([^\,]+),
(the word logUser
between quotation marks, whether or not followed by spaces, followed by :
, followed by several characters that are not commas, followed by comma). And apparently "it works".
The problem is that this regex does not validate if the entire string is actually a JSON, which is already a major drawback. You don’t just want to check if there is a line whose characters are in a certain format. You want to check whether the data is actually a valid JSON (which is a specific type of data, with a well-defined format and types), and whether this JSON has a certain key. And this the regex does not.
It is even possible to validate a JSON using regex, but honestly, it’s so complicated that in my opinion is not worth it (maybe only as curiosity and/or to delve into the regex syntax).
Also, the regex I suggested is too naive. First, it accepts invalid Jsons as:
{ "logUser":::::::123, "etc" ... }
{ "logUser": , }
This because of the [^,]
, which means "any character other than a comma", that is, if it has several spaces or several :
, a regex considers valid. We can create more complicated expressions (such as link already quoted), which correctly validate strings, numbers, etc, but still there are other problems.
For example, you only want the "logUser" of the first level. But regex cannot check which level the key is on:
{
"abc": 123,
"logUser": 123, <-- deveria pegar só esse
"segundoNivel": {
"logUser": 123, <-- não deveria pegar esse, pois não está no primeiro nível (mas pega)
},
...
}
Of course, I could make regex only take the first occurrence of "logUser", but JSON could also be like this:
{
"abc": 123,
"segundoNivel": {
"logUser": 123, <-- não deveria pegar esse, pois não está no primeiro nível (mas pega)
},
"logUser": 123, <-- deveria pegar só esse
...
}
Remember that a JSON object is a unordered set (ensemble unwarranted) of key/value pairs, which means that the order of the keys is not guaranteed, and depending on the library/language used, it may be that "logUser" appears as above. And the regex will not save you in these cases (unless you are Aventure with recursive regex, that besides complicated, not all languages support).
Of course you could check if the line starts with 4 spaces (a way to "guarantee" that it is in the first level), but then you would have to ensure that the JSON will always be formatted this way (and this guarantee has to be made outside the regex). But since you are going to format JSON - preferably using some library - why not take advantage and use this library to validate and check/get the field you need?
Finally, at this link you can see the regex "working" for these cases. Of course you can improve it by changing the comma by [,}]
("comma or }
", because "logUser" may be the last element, there will be no comma, and yes }
), or putting something like "\w+"|\d+
(quotation marks or numbers) to further restrict values, etc. But in the end you will end up with gigantic expressions like the of this link, which in my opinion makes it not worth using regex.
Use the right tool
Regex is a powerful and very cool tool, but is not always the best solution.
If your data is a JSON object, use a dedicated library. You did not specify which language you are using, but the vast majority of them (if not all) have some library to handle/read/validate/convert JSON.
Then just read the data and check if there is the key "logUser" in the first level (and get its value easily to compare it with what you want, etc), without having to worry about all the problems mentioned above, and without having to build a super-complicated regex (not only to do, but to keep in the future as well).
Using the wrong tool will only - unnecessarily - bring you problems that you wouldn’t have if you used the right tool.
Just a few examples for you to see how much simpler it is not to use regex. Since you did not specify which language you are using, I chose some "randomly". JSON has been simplified to not leave the code too long, but they all work for your JSON as well.
In Python, just use the module json
:
import json
def test(texto):
# transforma o texto em um objeto JSON
json_obj = json.loads(texto)
# verifica se possui logUser no primeiro nível
if 'logUser' in json_obj:
print('logUser está no primeiro nível, valor=', json_obj['logUser'])
else:
print('logUser não está no primeiro nível')
# logUser no primeiro nível
test("""
{
"abc": "cde",
"logUser": "xyz",
"segundoNivel": {
"etc": 123
}
}""");
# logUser no segundo nível
test("""
{
"abc": "cde",
"segundoNivel": {
"logUser": "xyz",
"etc": 123
}
}""")
In Java, there are several libraries available, such as bundle org.json
, the Gson from Google, etc. They all work very similarly (they receive a string and convert to a JSON object, which can be used to check if the key exists and get its value). An example with org.json.JSONObject
:
public void verifica(String texto) {
// transforma a string em um objeto JSON
JSONObject obj = new JSONObject(texto);
// verifica se possui logUser no primeiro nível
if (obj.has("logUser")) {
System.out.println("possui logUser no primeiro nível, valor=" + obj.getString("logUser"));
} else {
System.out.println("não possui logUser no primeiro nível");
}
}
// possui logUser no primeiro nível
verifica("{ \"abc\": \"cde\", \"logUser\": \"xyz\", \"segundoNivel\": { \"etc\": 123 } }");
// não possui logUser no primeiro nível
verifica("{ \"abc\": \"cde\", \"segundoNivel\": { \"logUser\": \"xyz\", \"etc\": 123 } }");
In PHP, you can use json_decode
:
function verifica($texto) {
// converte o texto para objeto JSON
$json = json_decode($texto);
// verifica se logUser está no primeiro nível
if (array_key_exists('logUser', $json)) {
echo "\nlogUser está no primeiro nível, valor=". $json->logUser;
} else{
echo "\nlogUser não está no primeiro nível";
}
}
// logUser no primeiro nível
verifica('{ "abc": "cde", "logUser": "xyz", "segundoNivel": { "etc": 123 } }');
// logUser no segundo nível
verifica('{ "abc": "cde", "segundoNivel": { "logUser": "xyz", "etc": 123 } }');
Like you said you want to check if logUser
exists, even if the value is null, I used array_key_exists
, that checks if the key exists, regardless of the value. If you want to discard null values, can exchange for isset
.
Finally, in Javascript, use the object JSON
:
function verifica(elemento){
// pega o texto do elemento
let texto = document.querySelector(elemento).value;
// transforma o texto em um objeto JSON
let json = JSON.parse(texto);
// verifica se possui logUser no primeiro nível
if (json.logUser) {
console.log(`${elemento} possui logUser no primeiro nível, valor=${json.logUser}`);
} else {
console.log(`${elemento} não possui logUser no primeiro nível`);
}
}
verifica('#logUserPrimeiroNivel');
verifica('#logUserSegundoNivel');
<textarea id="logUserPrimeiroNivel" rows="8">
{
"abc": "cde",
"logUser": "xyz",
"segundoNivel": {
"etc": 123
}
}
</textarea>
<textarea id="logUserSegundoNivel" rows="8">
{
"abc": "cde",
"segundoNivel": {
"logUser": "xyz",
"etc": 123
}
}
</textarea>
Anyway, most languages have some library to handle JSON, and they all usually work in a similar way (turn the string into JSON object, check if the key exists, take its value). And notice in the examples above that the code is very simple, much easier to understand and maintain than that:
$pcre_regex = '
/
(?(DEFINE)
(?<number> -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
(?<boolean> true | false | null )
(?<string> " ([^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
(?<array> \[ (?: (?&json) (?: , (?&json) )* )? \s* \] )
(?<pair> \s* (?&string) \s* : (?&json) )
(?<object> \{ (?: (?&pair) (?: , (?&pair) )* )? \s* \} )
(?<json> \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
)
\A (?&json) \Z
/six
';
And this is only to check if the string is a valid JSON. You still need to modify it to fit the logUser
only in the first level. Remembering that the above regex makes use of recursion, which not supported by all languages.
Definitely, regex nay is the best option for your case.
It must necessarily be with regex?
– Marciano Machado
Most languages (if not all) have libs ready to read and manipulate JSON, so use the appropriate tool for each task. Even manipulating the JSON object with the proper lib will be much easier than using a regex, and less error-prone as well - for example, a regex does not check if the JSON is well formed, if there are syntax errors, etc (and check this with regex it’s so complicated it’s not worth it). Regex is nice, but is not always the best solution
– hkotsubo
It doesn’t necessarily have to be regex. I think you’ve misexpressed the question. This JSON is intercepted in a filter in the middle of the request, I treat it as a String in order to add the attribute logUser if it is not contained in the Object or changes it if it comes. I opted for regex to avoid running through the string in search of the attribute, but I came across this problem.
– THIAGO TIERRE