Javascript regex schedules

Asked

Viewed 85 times

0

I’m doing a regular expression to check the following pattern:

2M1-2M2-6M1-6M2 Physics 2

Timetable (separated by -) + space + name of matter.

Where the first digit means (2-Monday | 3-Tuesday | 4-Wednesday ...) and the second and third digit respect a time table, where:

  • M1 -> Morning from 07:30 to 08:20
  • M2 -> Morning from 08h20 to 09h10
  • T1 -> Afternoon from 13h00 to 13h50
  • N4 -> Night from 9:20 to 10:10

And so on and so forth ...

The problem is, in my regex, I couldn’t find a way to eliminate the last -. The way I did, she’s taking hours like:

2M5-2N5- Física 2

And you should accept: 2M5-2N5 Física 2. (without - in the end)

The standard rule can accept only one time: 5T1 Física 2, or multiple times 1M2-1M3-1M4-5T1-5T2-5T3 Física 2.

Regex:

/^(([2-6]{1})([N]{1}[1-5]{1}|[MT]{1}[1-6]{1})([-]{1})){1,}([' ']){1}(.{1,})$/gim

Any idea how I can solve the problem of - in the end?

I do not know if there is a group reference in regex, because I do not know how many times will repeat the schedules, I only know that if you finish the schedules and have a space can not have a -. But I couldn’t get it through in code.

2 answers

2

I think you have focused too much on regex and content separators that have lost the focus of your problem that is interpreting the text.

Actually the problem doesn’t even need to be solved with regex, the method String.prototype.split() that would be enough. I decided to use regex only to show that the use of a computational tool should not be the goal but rather a facilitator to achieve a result.

The algorithm is trivial, receiving an input in a specific compact format and breaking into smaller information and making it humanly readable.
For this I created an array diasDaSemana which contains a literal description of the days of the week in Portuguese.
I also created the function periodo() that works according to the table presented in the question.

I’ve separated the subject name from the class schedule information I called turnos.
So in those turnos extracted the information with String.prototype.match() ignoring the separators thus being able to work individually each unit of information.

const entrada = "2M1-2M2-6M1-6M2 Física 2";

const diasDaSemana = ["Domingo", "Segunda", "Terça", "Quarta", "Quinta", "Sexta", "Sábado"];

function periodo(p) {
  return (p == "M1") ? "de manhã das 07:30am as 08:20am" :
    (p == "M2") ? "de manhã das 08:20am as 09:10am" :
    (p == "T1") ? "a tarde das 01:00pm as 01:50pm" :
    (p == "N1") ? "a noite das 09:20pm as 09:50pm" :
    "na madrugada dos mortos";
}

let [turnos, ...matéria] = entrada.split(" ");     //Separa nome da matéria das informações de turnos.
matéria = matéria.join(" ");                       //Contorna a limitação do método split JS.

turnos = turnos.match(/(\d[MTN]\d)/gim);               //Cria a lista de turnos ignorando informações desnecessárias. 

console.log(`Horário das aulas de ${matéria}:`);
for (let t of turnos) {
  let texto = `${diasDaSemana[parseInt(t[0])-1]} ${periodo(t.slice(1))}`;
  console.log(texto);
}

1

No need for regex. If the format is this and you have already validated (or if you are "sure" that you always receive a valid string), simply separate the parts with split and then get the first three characters of each of the parts.

const diasDaSemana = ["Domingo", "Segunda", "Terça", "Quarta", "Quinta", "Sexta", "Sábado"];
const periodos = {
    "M1" : "manhã das 07:30 às 08:20",
    "M2" : "manhã das 08:20 às 09:10",
    "T1" : "tarde das 13:00 às 13:50",
    "N4" : "noite das 21:20 às 22:10"
    // coloque aqui todas as opções
};

const entrada = "2M1-2M2-6T1-6N4 Física 2";
const i = entrada.indexOf(' '); // índice do primeiro espaço
const horarios = entrada.slice(0, i); // tudo até o primeiro espaço
const materia = entrada.slice(i + 1); // tudo depois do primeiro espaço
console.log(`Horários de ${materia}`);
// obtém os horários (separa por hífen)
for (const sigla of entrada.split('-')) {
    // pega o primeiro dígito e obtém o respectivo dia da semana
    const dia = diasDaSemana[parseInt(sigla[0]) - 1];
    // pega o código do período (ou uma mensagem padrão, caso não haja horário correspondente)
    const periodo = periodos[sigla.slice(1, 3)] || 'Não há horário cadastrado';
    console.log(`${dia} - ${periodo}`);
}

First I separate the time codes and the name of the matter (basically, one is "everything before the first space", the other is "everything after the first space").

Then I make another split by hyphen, thus obtaining the codes of the times. There it is only "slice" each code: the first character (obtained with sigla[0]) is transformed into number with parseInt, and subtract 1 to get its index in the array diasDaSemana (I did so because the first index is zero, so you have to subtract 1).

Then I get the rest of the code (sigla.slice(1, 3), which takes from the second character to the third) and checks if it exists in the object periodos (There you register all the code options and respective times). Note that there is also a check if the code does not exist (do not know if it will fall in this case, check if it makes sense to you).


But if you really want to use regex...

The above solution I find much simpler and is what I would use, but if you really want to use regex, come on...

The first thing is to remove that pile of {1}, as it is redundant and unnecessary. By default, (qualquer coisa){1} is the same as (qualquer coisa) (and to repeat something once or more, you can use the quantifier + instead of {1,}).

The idea is similar (first separates the name from the matter, then goes through the codes):

const diasDaSemana = ["Domingo", "Segunda", "Terça", "Quarta", "Quinta", "Sexta", "Sábado"];
const periodos = {
    "M1" : "manhã das 07:30 às 08:20",
    "M2" : "manhã das 08:20 às 09:10",
    "T1" : "tarde das 13:00 às 13:50",
    "N4" : "noite das 21:20 às 22:10"
    // coloque aqui todas as opções
};

const entrada = "2M1-2M2-6T1-6N4 Física 2";
const i = entrada.indexOf(' ');
console.log(`Horários de ${entrada.slice(i + 1)}`);
const regex = /(\d)([MNT]\d)(-|$)/g;
for (const match of entrada.slice(0, i).matchAll(regex)) {
    const dia = diasDaSemana[parseInt(match[1]) - 1];
    const periodo = periodos[match[2]] || 'Não há horário cadastrado';
    console.log(`${dia} - ${periodo}`);
}

The regex is (\d)([MNT]\d)(-|$). Each pair of parentheses forms a capture group, which you can get later. The first pair of parentheses (and therefore group 1) contains only \d (one digit, which in this case is the day of the week).

Then, in the second group, we have [MNT], which indicates "the letter M, or the letter N, or the letter T" (only one of them - and here you can put more options, if you have), followed by a digit.

And there is also a third group, which checks whether it has a hyphen or a marker $, that indicates the end of the string. Remember that before we separated the original string into two parts: the codes and the name of the matter, so I’m looking for the pouch only in the part that has the codes, so there will be nothing after the last (hence the need to check a hyphen or the end of the string).

Then we use matchAll (that requires the regex to have the flag g), to traverse the pouch. And for each match, we took the groups (match[1] takes the first group, which is the day of the week, match[2] takes the second group, which is time code).


You also used the flags i and m, but I don’t think they’re necessary. The i enables the mode case insensitive (does not differentiate between upper and lower case), but in your case it seems that the codes always have upper case letters, so it does not seem to make sense to use the i. And the m enables the mode multiline, in which the markers ^ and $ (that indicate the beginning and end of the string) also consider the beginning and end of a line. But in your case, as the string is all in one line, the m makes no difference.

Another detail is that, as the amount of codes varies, there is no way to get them separately in the same match. For example:

const match = "2M1-2M2-6M1-6M2 Física 2".match(/^\d[MNT]\d(-\d[MNT]\d)* (.+)$/);
console.log(match[1]); // -6M2

I put the excerpt (-\d[MNT]\d) (hyphen, digit, letter, digit) to repeat itself zero or more times (indicated by *), but note that the group is only filled with the last occurrence found (in case, -6M2). So there’s no way to get the intermediate results separately with just one match (can get everything together, but then you would have to separate them using one of the above methods).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.