0
In the State of SP for example ADA
would be Adamantine, ADO
Adolfo, AGU
Aguai, AGD
High, ..., ADC
, Alvaro de Carvalho, ABR
Americo Brasiliense, ... SJC
Sao Jose dos Campos, ... but "could be" is kick.
The three letters will be in physical medium (plates) and digital, can not be arbitrary. We need preference for a standard, as did IATA (airports) or Anatel, but not a federal standard (namespace of ~5500 items for the 3 letters), a state standard, where for example in SP we have ~650 items to associate the codes of the same 3 letters.
TECHNICAL NOTE
The 3-letter code plays the role of a hash, and in this sense it is worth to classic collision chance ratio... If we have 26 3 combinations with 3 letters, to keep the chances of collision at acceptable levels, we can’t map more than a fraction of that, say 1% to 5% of the 17576 combinations. The namespace of 5500 (31%!) precisely extrapolated and Anatel paid the price, was forced to use Y instead of I, or "nothing-there abbreviations" because of the excess of collisions in the most mnemonic choices - more like car plates than mnemonic acronyms.
From my evaluation the 3% of SP or max. 4% of MG are reasonable and would generate good results.
If there is an algorithm ready and standardized would also be a solution. For example Metaphone is a standard but does not preserve the same alphabet or generate mnemonic codes for initials in the case of compound names (e.g. São José dos Campos would result in SJC) of two or more words. I believe that we do not need to "reinvent the wheel" and the ideal algorithm for Portuguese already exists... Roughly the ideal algorithm is simple, it does this:
If simple name, try the first 3 letters.
Itu and Jaú are automatically resolved, Campinas and Marilia would be leftCAM
andMAR
.1.1. If there is a collision (ex. Marinopolis can’t use
MAR
), adopt the first letter followed only by consonants (e.g..MRN
).Names composed of several words: first try the initials without preposition, then generate combinations with the previous item of the initial or final words of the name.
Example: Santa Rita do Passa Quatro can beSRP
,SPQ
, ... orSTQ
,SQT
, etc..
Ideal that the algorithm is well grounded and already adopted as standard elsewhere.
Another problem to be solved by the algorithm, so that it is fair to future users of the created acronyms. Who deserves the acronym "more beautiful"?
For example, BOR
goes to Borborema (15790 inhabitants and founded in ) or to Borá (890 inhabitants and founded in 1965 )? The oldest or the most populous?
References
Brazilian cities with airports: apparently all of them should have IATA abbreviations, but I couldn’t find a systematic list of Brazil (not to be confused with airports)... A few examples but I don’t know if they’re official:
ANS
Anápolis-GO,PTM
Patos de Minas-MG,PTS
Patos-PB,URA
Uberaba-MG,UDI
Uberlândia-MG... and the most exotic,BSB
Brasília-DF,GYN
Goiânia-GO, ...Anatel abbreviations: in PDF and seems to be (Date HTTP header) 2013.
CDHU/SP abbreviations: they do not have 3 letters, but they are all simple and short words, greatly simplifying the work of the algorithm. It would help as "official reference" if there is no other starting point... Only available at PDF and appears to be 2005.
Data of cities, by state, etc. Wikidata. How to do queries Sparql can be boring, some tables like the of São Paulo are already ready, and a homologated dataset can also be used.
How about taking Anatel’s and separating by state?
– Jefferson Quesado
@Jeffersonquesado edited the question explaining better, check the "TECHNICAL NOTE", if need review.
– Peter Krauss
@Bacco, thank you for explaining your vote-against, but note that I’m just contextualizing the problem, I don’t care if you answer with C# or Perl or so... And precisely a problem that I made a point of bringing here and not in the main Stackoverflow, because I want to first emphasize the context of the Portuguese language (!). These are numerical questions, I can remove everything I put on Brazil and SP, but I would lose the context... How about waiting for the community to manifest? There were already 2 votes I believe that in 1 week someone responds... Or you think more important to give more face of "programming problem"?
– Peter Krauss
My vote is not binding, I would have to have 4 more closing. It may be that the staff does not close. I actually think it was pretty elaborate, but I can’t see a real programming problem that I didn’t leave wide at least. The need itself I find really out of scope (but that’s how I see it, not necessarily how others see it).
– Bacco
Anyway, I withdrew the comment and the vote not to influence, and then I delete these here.
– Bacco
@Peterkrauss if I understand correctly you want an algorithm that defines a 3 letter code for each municipality in Brazil? We have 5564 municipalities( second source: https://ww2.ibge.gov.br/home/statica/populacao/count2007/popmunic2007layoutTCU14112007.xls!
– Diogo Lindoso
I initially thought @Bacco was being too orthodox about the scope. Because the question is interesting, and even though it’s strictly outside the scope, I thought that maybe you could squeeze the bar in a little bit. Seeing how the question evolved, I even thought of closing as unclear or even "based on opinions" as well. I decided to stick with the suggestion of Bacco because in the background is question for a geographer answer.
– bfavaretto
Hi @Diogolindoso, what I tried to express is that the abbreviations for the ~5600 municipalities already exist, were made by Anatel, and that we do not want this ("... but not a federal standard"), we are seeking abbreviations for smaller sets, by state: SP, MG, etc. which never exceed ~800. As for the algorithm I outlined is a kick, it’s nice to do but the hard thing is to find out if there is any "standard algorithm", which has already been used or has a good foundation.
– Peter Krauss
If there is no pattern (and your previous search seems to indicate this), any path will be arbitrary. You’re looking for the least arbitrary possible, right? "Possible" within conditions that you’re determining for yourself. It would need to define conditions more clearly. For example, should the population be taken into account? The response that appeared disregarded this. Collision rules are not clear either. Can you repeat acronyms in different states? Defined the conditions, the algorithm would be practically given. In this case, what would be left to ask?
– bfavaretto
Well, already answered one of my question in the comment posted almost at the same time of mine. : ) I reiterate that I find the subject interesting, but I do not know how to make the question less problematic.
– bfavaretto
@bfavaretto, yes, we lack parameters and fundamentals, I do not know if I edit the question again or if I start a Github ;-) As I have already started to answer I will do some tests and also maintain a certain compatibility between question and answer.. The answer sounds good, maybe we’re done here.
– Peter Krauss
made progress @Peterkrauss ?
– Rovann Linhalis
Hi @Rovannlinhalis, I’m waiting for the green light of a project to resume the subject here, with subsidies on Github... Actually a tree-breaking algorithm, like the current answer (which was great!), I had already published in PHP before I came to ask the question... The project does not target the ZIP code but can be better understood/imagined as a replacement of the numerical ZIP code by a code with letters and numbers, initiated by the 2 letters of the state, then 3 letters of the municipality. Ver https://github.com/OSMBrasil/CRP e http://datasets.ok.org.br/city-codes
– Peter Krauss