Splitting a string into an array of strings from the occurrence of a date

Asked

Viewed 215 times

1

I need to separate a string, similar to the example below, into an array of strings starting from the event time date and event code (e.g.: 03/07/2019, 15:43 -104) and ending until the next occurrence.

"07/03/2019, 15:43  - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO   ,07/03/2019, 15:43  - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100   ,07/03/2019, 15:43  - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99   ,07/03/2019, 15:43  - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96   ,01/03/2019, 19:20  - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO -  SILVANO SOUZA)  Prazo: 15 dias  Data final: ,29/03/2019, 23:59:59"

Expected Result:

1. "07/03/2019, 15:43  - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO" 
2. "07/03/2019, 15:43  - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento:100"
3. "07/03/2019, 15:43  - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99"
4. "07/03/2019, 15:43  - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96"
5. "01/03/2019, 19:20 - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO -  SILVANO SOUZA)  Prazo: 15 dias  Data final: ,29/03/2019, 23:59:59"

I tried using the following code with regular expression:

let eventos = text.split(/\b(\d+\/\d+\/\d+)\b/g);

But it separates only by date, and if a date occurs in the middle of the event it separates the event into two.

  • Your question is a little vague. Is there a pattern for this separation to be made? Have you tried to make some code for it?

  • Luiz, I added the requested information.

  • The problem is that nothing guarantees that there will be a specific pattern for the separation of this text. This is one of the problems when working with strings... We could even try to separate by date, but see the last item, for example, which has a date (29/03/2019) that does not indicate a separation in itself, but something like an observation, right?

  • Your observation is correct Luiz, so I want to separate the string by default date-time and code, ex : 07/03/2019, 15:43 - 104, because this repeats and is unique.

2 answers

2


For this we can use a regex that instead of using the date for the split, uses a comma, provided it is followed by "date, time - code". Assuming that the code is always numeric, a solution would be:

let str = "07/03/2019, 15:43  - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO   ,07/03/2019, 15:43  - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100   ,07/03/2019, 15:43  - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99   ,07/03/2019, 15:43  - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96   ,01/03/2019, 19:20  - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO -  SILVANO SOUZA)  Prazo: 15 dias  Data final: ,29/03/2019, 23:59:59";

let result = str.split(/,(?=\d{2}\/\d{2}\/\d{4}, \d{2}:\d{2}\s+-\s+\d+)/).map(s => s.trim());
console.log(result);

Notice I used \d{2} and \d{4} instead of \d+. The quantifier + means "one or more occurrences", meaning it accepts any number of digits. Already using {2} and {4} I guarantee you must have exactly these quantities (\d{2} is "exactly two digits" and \d{4} is "exactly 4 digits"). If you have dates like 1/2/2019, for example, you can use \d{1,2} (not less than 1 and not more than 2 digits).

I just used \d+ for the code, as I am assuming that it is always numerical and the size may vary. But you can also use other variations to define the sizes if you want to be more specific. Examples:

  • \d{3}: exactly 3 digits
  • \d{1,4}: between 1 and 4 digits
  • \d{3,}: at least 3 digits

Use what’s best for your case.

The result is:

[
  "07/03/2019, 15:43  - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO",
  "07/03/2019, 15:43  - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100",
  "07/03/2019, 15:43  - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99",
  "07/03/2019, 15:43  - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96",
  "01/03/2019, 19:20  - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO -  SILVANO SOUZA)  Prazo: 15 dias  Data final: ,29/03/2019, 23:59:59"
]

The trick here is on Lookahead, indicated by (?=....). What he does is check if something exists after the current position. In this case, I’m checking if everything within the Lookahead is after the comma. And inside it I have the date, followed by a comma, followed by a space, the time, one or more spaces (\s+), hyphen, one or more spaces and one or more numbers (that would be the code, which I’m assuming is always numerical).

The great trick of Lookahead is that he only checks if these things exist, but they are not part of the match, and so are not removed in the split. Then the split is only done in commas, but only in those that have date, time and code right after. Other commas are ignored.

At last, I use trim() only to delete the spaces at the end of each string.


But it is also possible to eliminate the use of trim if we include the spaces in the split:

let str = "07/03/2019, 15:43  - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO   ,07/03/2019, 15:43  - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100   ,07/03/2019, 15:43  - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99   ,07/03/2019, 15:43  - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96   ,01/03/2019, 19:20  - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO -  SILVANO SOUZA)  Prazo: 15 dias  Data final: ,29/03/2019, 23:59:59";

let result = str.split(/\s*,(?=\d{2}\/\d{2}\/\d{4}, \d{2}:\d{2}\s+-\s+\d+)/);
console.log(result);

Now regex checks zero or more spaces (\s*) before the comma, then they are also removed by split, and so it is no longer necessary to use trim().


About the regex of dates

I speak with much more detail in this answer, but just to summarize: use \d{2} accepts values between "00" and "99", which obviously can end up picking values that are not dates, not to mention that can also accept values such as 29/02/2019 (and 2019 is not leap year, so this year February has 29 days).

If this string comes from a trusted/controlled source and you know you always have valid dates, the above regex is enough. But if you want to make it more precise, you can use the suggestions of the answer I indicated. The date and time part would look something like:

(?:0[1-9]|[12]\d|3[01])\/(?:0[1-9]|1[0-2])\/(?:19|20)\d{2}, (?:[01]\d|2[0-3]):(?:[0-5]\d)

Then the code would be:

let str = "07/03/2019, 15:43  - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO   ,07/03/2019, 15:43  - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100   ,07/03/2019, 15:43  - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99   ,07/03/2019, 15:43  - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96   ,01/03/2019, 19:20  - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO -  SILVANO SOUZA)  Prazo: 15 dias  Data final: ,29/03/2019, 23:59:59";

let result = str.split(/\s*,(?=(?:0[1-9]|[12]\d|3[01])\/(?:0[1-9]|1[0-2])\/(?:19|20)\d{2}, (?:[01]\d|2[0-3]):(?:[0-5]\d)\s+-\s+\d+)/);
console.log(result);

This still does not solve the case of leap years, but already eliminates cases where the day is longer than 31, months longer than 12, minutes longer than 59, etc. Finally, adjust the regex according to what you need.

1

You can use a regular expression to achieve the expected result. Although it does not guarantee success if the passed string is not standardized.

Analyzing the pattern of the string provided in the question, I was able to create the following regular expression:

/\d{2}\/\d{2}\/\d{4}, \d{2}:\d{2}(?= -)/g

By unlocking it, we can determine that:

  • \d{2} and \d{4} two and four numbers in a row, respectively;
  • \/ a bar (/);
  • (?= -) one Lookahead (causes the expression to match only if it has a space followed by a hyphen in front of the date ( -).

However, it is not enough to use the split to solve the problem, since this method removes the dates found that divide the string. Since we don’t want the initial dates (which are responsible for splitting the string), we also need to use the method match to capture and unite them afterwards.

Something like that:

function splitString(input) {
  const regex = /\d{2}\/\d{2}\/\d{4}, \d{2}:\d{2}(?= -)/g
  const matches = input.match(regex)

  return (splittedInput = input
    // Divide a string com base na expressão regular definida acima.
    .split(regex)
    // Remove strings vazias:
    .filter((s) => !!s)
    // Junta os matches com as divisões, já que o split remove o match e nós não
    // queremos esse comportamento.
    .map((value, i) => `${matches[i]} ${value.trim()}`))
}

console.log(
  splitString(
    '07/03/2019, 15:43 - 104. PETIÇÃO PROTOCOLADA JUNTADA - Refer. aos Eventos: 96, 99 e 100 - CIÊNCIA, COM RENÚNCIA AO PRAZO   ,07/03/2019, 15:43 - 103. Intimação Eletrônica - Confirmada - Refer. ao Evento: 100   ,07/03/2019, 15:43 - 102. Intimação Eletrônica - Confirmada - Refer. ao Evento: 99   ,07/03/2019, 15:43 - 101. Intimação Eletrônica - Confirmada - Refer. ao Evento: 96   ,01/03/2019, 19:20 - 100. Intimação Eletrônica - Expedida/Certificada - Julgamento (APELADO - SILVANO SOUZA) Prazo: 15 dias Data final: ,29/03/2019, 23:59:59'
  )
)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.