A regular expression to detect acronyms of Brazilian highways

Asked

Viewed 660 times

1

I’m trying to detect if a particular address corresponds to a Brazilian highway.

For example, br-101 corresponds.

My initial plan was to list the state acronyms (mg, sp, Rn ...) plus the br acronym, and write something like /sigla1-[0-9]{3}|sigla2-[0-9]{3}.../.

But a query to wikipedia brought me a surprise: there are other prefixes besides states. (for example, prc, in https://pt.wikipedia.org/wiki/Rodovias_do_Paran%C3%A1)

I ask then: What is the most correct way to detect the highways?

We can take (duas_ou_tres_letras)-(tres_numeros), for example. The part before the hyphen necessarily has two or three letters? the part after the hyphen may have less than three numbers?

Would someone happen to have a list of possible acronyms that might come before the hyphen?

  • It seems that the pattern 2|3 letras - 3 números is valid. Can do?

  • Yes, yes. My problem is not writing reg Exp, but knowing which criterion is the most reasonable. If it existed, I would be asking in stack overflow DER, not in programming...

2 answers

2

I found the question interesting and tried to inform me how the nomenclature of the highways of Brazil works.

According to the website federal government highways there is a standard for defining the names of federal highways. And by the self searched I could see that this pattern is also adopted on state highways, but there are exceptions.

The first number of the name of the highway, for example, BR-307 has meaning and varies from 0 to 6. And also applies to state highways.

  • Radial highways: BR-0xx - highways from the federal capital towards the country’s extremes
  • Longitudinal highways: BR-1xx - highways that cross the country in north-south direction
  • Transverse highways: BR-2xx - highways that cross the country in the direction east-west
  • Diagonal highways: BR-3xx - highways can present two modes of orientation: northwest-southeast or northeast-southwest
  • Connecting highways: BR-4xx - highways present in any direction. There are also highways started with BR-6xx, but there are few and short-lived.

It would be interesting to confirm this information so that regex is more accurate, for example:

  • We know that the first pieces of information are capital letters and vary 2 to 3 letters: [A-Z]{2,3}
  • There’s a hyphen between letters and numbers: -
  • The first number varies from 0 to 6: [0-6]
  • And it ends with two more digits: [0-9]{2}

Finally your regex would look like this: [A-Z]{2,3}-[0-6][0-9]{2}. Functional example

1

You can assemble two regex one more generic to validate only the format of the highway and another more specialized that guarantees with greater chances your existence.

According to the research I’ve done on some highways, C after the state acronym because they are coincident or a section of a federal highway is in the same stretch of a state and it is the responsibility of the state to maintain the conservation but I have not found any centralized list each is maintains its own list.

Not all states have matching highways so the second regex house invalid values like BRC-000 or ACC-00 so it is necessary to further treat the application as an exceptions list or find out which states have these highways and refine more regex.

The Generica would be:

[A-Z]{2,3}-[0-9]{3}

Entrances:

BR-101 //OK
ABC-100 //OK
ZZ-000 //OK

Example - regex101

The other would be the list of state acronyms followed by a C of optional coincident followed by dash and three numbers.

(AC|AL|AP|AM|BA|CE|DF|ES|GO|MA|MT|MS|MG|PA|PB|PR|PE|PI|RJ|RN|RS|RO|RR|SC|SP|SE|TO|BR)C?-[0-9]{3}

Entrances:

BR-101 //OK
ABC-000 //fora do padrão
ZZZ-999 //fora do padrão
PRC-280 //OK
RSC-453 //OK
BRC-000 //OK mas é inválida
ACC-999 //OK mas é inválida

Example - regex101

  • And why not do it anyway?

  • @Andersoncarloswoss the first serves to validate the format only, while the second is more specific closer to reality. As I wrote implied that you need both? anything already reviewed

  • Not as a matter of necessity, but is that leaving the third character optional, it will match the generic names too, so I questioned the fact of using both and not just one.

  • 1

    @Andersoncarloswoss I edited the answer if you have any more remarks let me know.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.