For this we can use a regex that instead of using the date for the split
, uses a comma, provided it is followed by "date, time - code". Assuming that the code is always numeric, a solution would be:
let str = "07/03/2019, 15:43 - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO ,07/03/2019, 15:43 - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100 ,07/03/2019, 15:43 - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99 ,07/03/2019, 15:43 - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96 ,01/03/2019, 19:20 - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO - SILVANO SOUZA) Prazo: 15 dias Data final: ,29/03/2019, 23:59:59";
let result = str.split(/,(?=\d{2}\/\d{2}\/\d{4}, \d{2}:\d{2}\s+-\s+\d+)/).map(s => s.trim());
console.log(result);
Notice I used \d{2}
and \d{4}
instead of \d+
. The quantifier +
means "one or more occurrences", meaning it accepts any number of digits. Already using {2}
and {4}
I guarantee you must have exactly these quantities (\d{2}
is "exactly two digits" and \d{4}
is "exactly 4 digits"). If you have dates like 1/2/2019
, for example, you can use \d{1,2}
(not less than 1 and not more than 2 digits).
I just used \d+
for the code, as I am assuming that it is always numerical and the size may vary. But you can also use other variations to define the sizes if you want to be more specific. Examples:
\d{3}
: exactly 3 digits
\d{1,4}
: between 1 and 4 digits
\d{3,}
: at least 3 digits
Use what’s best for your case.
The result is:
[
"07/03/2019, 15:43 - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO",
"07/03/2019, 15:43 - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100",
"07/03/2019, 15:43 - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99",
"07/03/2019, 15:43 - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96",
"01/03/2019, 19:20 - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO - SILVANO SOUZA) Prazo: 15 dias Data final: ,29/03/2019, 23:59:59"
]
The trick here is on Lookahead, indicated by (?=....)
. What he does is check if something exists after the current position. In this case, I’m checking if everything within the Lookahead is after the comma. And inside it I have the date, followed by a comma, followed by a space, the time, one or more spaces (\s+
), hyphen, one or more spaces and one or more numbers (that would be the code, which I’m assuming is always numerical).
The great trick of Lookahead is that he only checks if these things exist, but they are not part of the match, and so are not removed in the split
. Then the split
is only done in commas, but only in those that have date, time and code right after. Other commas are ignored.
At last, I use trim()
only to delete the spaces at the end of each string.
But it is also possible to eliminate the use of trim
if we include the spaces in the split
:
let str = "07/03/2019, 15:43 - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO ,07/03/2019, 15:43 - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100 ,07/03/2019, 15:43 - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99 ,07/03/2019, 15:43 - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96 ,01/03/2019, 19:20 - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO - SILVANO SOUZA) Prazo: 15 dias Data final: ,29/03/2019, 23:59:59";
let result = str.split(/\s*,(?=\d{2}\/\d{2}\/\d{4}, \d{2}:\d{2}\s+-\s+\d+)/);
console.log(result);
Now regex checks zero or more spaces (\s*
) before the comma, then they are also removed by split
, and so it is no longer necessary to use trim()
.
About the regex of dates
I speak with much more detail in this answer, but just to summarize: use \d{2}
accepts values between "00" and "99", which obviously can end up picking values that are not dates, not to mention that can also accept values such as 29/02/2019 (and 2019 is not leap year, so this year February has 29 days).
If this string comes from a trusted/controlled source and you know you always have valid dates, the above regex is enough. But if you want to make it more precise, you can use the suggestions of the answer I indicated. The date and time part would look something like:
(?:0[1-9]|[12]\d|3[01])\/(?:0[1-9]|1[0-2])\/(?:19|20)\d{2}, (?:[01]\d|2[0-3]):(?:[0-5]\d)
Then the code would be:
let str = "07/03/2019, 15:43 - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO ,07/03/2019, 15:43 - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100 ,07/03/2019, 15:43 - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99 ,07/03/2019, 15:43 - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96 ,01/03/2019, 19:20 - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO - SILVANO SOUZA) Prazo: 15 dias Data final: ,29/03/2019, 23:59:59";
let result = str.split(/\s*,(?=(?:0[1-9]|[12]\d|3[01])\/(?:0[1-9]|1[0-2])\/(?:19|20)\d{2}, (?:[01]\d|2[0-3]):(?:[0-5]\d)\s+-\s+\d+)/);
console.log(result);
This still does not solve the case of leap years, but already eliminates cases where the day is longer than 31, months longer than 12, minutes longer than 59, etc. Finally, adjust the regex according to what you need.
Your question is a little vague. Is there a pattern for this separation to be made? Have you tried to make some code for it?
– Luiz Felipe
Luiz, I added the requested information.
– Edson Guido
The problem is that nothing guarantees that there will be a specific pattern for the separation of this text. This is one of the problems when working with strings... We could even try to separate by date, but see the last item, for example, which has a date (29/03/2019) that does not indicate a separation in itself, but something like an observation, right?
– Luiz Felipe
Your observation is correct Luiz, so I want to separate the string by default date-time and code, ex : 07/03/2019, 15:43 - 104, because this repeats and is unique.
– Edson Guido