(THAT’S NOT AN ANSWER!)
The following notes are subsidies to the answers, and also a return to the comments posted on the question. It is a text Wiki: you can collaborate by reviewing and expanding!
GRANT NOTES
This is an open text (Wiki) to subsidize the general question of free access at the VOLP (or to the VOP), heritage of Portuguese-speaking nations.
As most of us are unfamiliar, we need to start with a certain review of laws, open vocabularies and trustworthy dictionaries. One can notice from the comments that the question is not merely technical. The option in these grant notes was to encourage the direction taken on similar issues, where more than one answer is discussed. All readers and respondents are invited to also edit the text of these notes.
Copyright and duration
(in response to @Bacco) About obligatory and its duration. According to the consolidated version of several sources on Wikipedia and Official laws of Lexml:
The "Orthographic Agreement of 1990" was promulgated by the National Congress on April 18, 1995;
To implement the aforementioned Orthographic Agreement, here in Brazil, the Federal Decrees 6.583, 6.584 and 6.585, and the Amending Protocol to that Agreement has been approved.
The Decree nº6.583 of 2008 has as an annex "ORTHOGRAPHIC AGREEMENT OF THE PORTUGUESE LANGUAGE", which stated in its Article 2 that "the signatory States shall take (...) the necessary steps to develop (...) a common spelling vocabulary for the Portuguese language". Further establishes that "authorised vocabularies shall record admissible alternative spellings (...) it is clear that only the consultation of vocabularies or dictionaries may indicate".
Decreto 6584, is annexed to the "AMENDING PROTOCOL TO THE ORTHOGRAPHIC AGREEMENT OF THE PORTUGUESE LANGUAGE": it gives a new wording only for Articles 2 and 3, fixing as valid the vocabularies elaborated "until 1 January 1993".
The VOLP is edited by the Brazilian Academy of Letters (ABL), which allegedly had the legal responsibility to edit it: this assumption and many others on spelling, pronunciation, etc. Portuguese officials, not listed in Decreto Eduardo Ramos, de n. 726, de 8/12/1900, only standard cited for this purpose. Second crumbs.com.br "... this position has thus been recognised without challenge for decades", that is, there is no written law, only a "tradition" in filling this gap.
The MEC was the agency that most charged for mandatory (and almost got in Brazilian textbooks) a partir de janeiro de 2013... Not by chance Brazil had been represented in 1990 by the Minister of Education. But (in 2012) the federal government, by Law 7875, postponed the obligation to 2016.
(in response to @mgibsonbr) About contradictions between the right to collect copyright (not offer download) and the Brazilian Constitutional Law. It appears that the sale of the VOLP is theoretically unconstitutional:
The fundamentals of Lexml can be extended to the VOLP: the VOLP is quoted in Law, so it is part of it. The government has an obligation to make it public, can not charge the citizen for access to the Law.
The Brazilian citizen can not claim "ignorance of the Law": the Constitution Federeal (CF) would guarantee "mandatory publication" (art. 37), "right of access" (art. 5º, item XIV) and "obligation of franchise to access" (art. 216, § 2º).
Previous case: standard ABNT NBR 9050:2005 - Accessibility of buildings, and it seems that also the NBR-15575-5, are the only ones open (text download is offered). Complaints about ABNT abuses are old (see 1, 2, and dozens of others)... Complaints seem to have triggered a first opening initiative. The impact of the VOLP on every citizen, however, is much greater than the impact of restricting access to an ABNT standard, so greater attention should be paid to.
Contextualization
There are dozens of "unofficial vocabularies", some even reliable and better structured than the VOLP, but are not suitable for certification (none of the searches contains a flag indicating the spellings of the VOLP):
Unitex Project: probably the most rigorous and solid "framework for dictionaries". The ideal would be to work with it... See Downloads from Unitex3.0.zip with all the dictionaries, in 2013 was not yet updated to the Orthographic Agreement 1990.
Project Vero do Livreoffice: perhaps not as strict as Unitex, but certainly today the most complete (voluminous) and the one that most received collaborations, reviews, checks, etc. The download the application depends on downloading the latest version of the file extension "oxt", which is itself the source data of the dictionary and everything else. The VERO project was built with the use of the hunspell software which implements Libreoffice, Firefox, Chrome and many other applications. The VERO project is the initiative that created the linguistic data (affixes, flexions, etc.) that are processed by Hunspell: the most current Vero file, for example, Veroptbrv320aoc.oxt, loads all the data. To access them simply rename the file to Veroptbrv320aoc.zip and unzipar. Running (tip from Raimundo and his collaborators) ./unmunch pt_BR.dic pt_BR.aff | sort -u > listaCompleta.txt
we obtain a complete list of all Portuguese words (more complete and coherent even than the VOLP). According to R.S.Moura,
... VERO never received support from ABL. Our lexicon is the result of the voluntary work of many selfless who wore the shirt of this Project, making available their academic materials, researches, lists of terms, pointing out flaws and suggesting new words, during the eight years of VERO activity" (personal email of April 2014, authorized reproduction).
- Unix "words" feature: on UBUNTU was nicknamed Wordlist and stays in
/usr/share/dict/brazilian
(list with more
). Can help check words, consolidate with other open dictionaries, but does not seem very reliable or as active. Installs with sudo apt-get install wbrazilian
.
Dictionaries proper: they have the character of "ontology" (semantic description of words), rather than "vocabulary"... As in general they also cover the vocabulary, if reliable and complete, they can be as useful as vocabularies:
Portuguese wiktionary.org: collaborative source... Can be evaluated by download (ptwiktionary-latest-all-titles.gz
) which is still incomplete, in addition to not having "flag VOLP".
pt-PT, Docionário de Candido de Figueiredo, de 1913: is a good starting point for the creation of a dictionary in the public domain... "being the edition of this dictionary of 1913, according to the current legislation on copyright, the copyrights of its contents have already prescribed, making it integral in the Public Domain"3.
Peter Krauss, not directly related to your question, but if your correction is not based on the VOLP, maybe this answer that I gave in another question be useful somehow.
– Rael Gugelmin Cunha
"I heard around" that they have managed to extract from the site mentioned the list (and that it does not give all this verbiage), using scripts, but I do not know how the legal aspects are. Maybe it is true, because the url of ajax http://www.academia.org.br/sistema_busca_palavras_portuguesas/volta_voca_org.asp?palavra=a should be very simple to parse, and a smart script would deviate from the limit of 200 entries per page without problems. (if A passes 200, uses AA, AB, in turn if IN passes 200, INA .. INZ, and so on). Of course it is only hypothetical, the ideal is to buy the same dictionary ;)
– Bacco
@Bacco Unfortunately here in Brazil it is common the situation in which, by law, you need to follow a certain standard or technical standard, but the information on this standard needs to be purchased - as being subject to copyright cannot be copied. The same goes for various official data, such as the postal codes. This is a huge obstacle to progress, because many computerized solutions become unviable and/or prohibitively expensive - even if the technical means are readily available, you are barred by legal aspects.
– mgibsonbr
@mgibsonbr my comment was rightly inspired by the "disagreement" with these absurd things, as is the case of the Zip Codes in Brazil also (which were free and suddenly cost a fortune). Now, of course I won’t download anything illegally right... it became evident that my answer is merely hypothetical, if you know what I mean ;)
– Bacco
@Bacco OK, I was just agreeing with you ("but I don’t know how the legal aspects look"). It didn’t seem that you were stimulating anything illegal no, sorry if I passed this impression!
– mgibsonbr
@Raelgugelmincunha, my problem is not merely correction, it is much wider, and I know of several other applications that require the VOLP to "certify in accordance with the Law". Your suggestion I think fits, and is much more related, to discussion we had about Metaphone.
– Peter Krauss
@Bacco and mgibsonbr: I am editing the question... See if with the notes I can get some answers from our readers, even if only to discuss the subject.
– Peter Krauss
There are some corpora of Portuguese text extracted from various sources such as newspapers and magazines that scientists use for various purposes (such as the construction of SPAM analyzers, for example). Of course they are not official like the VOLP, but perhaps they are a useful alternative to other needs than dictionaries. Examples: CETEM (PT), CETEM/Folha (BR) and LAEL (BR)
– Luiz Vieira
@Luizvieira, The selection and organization of corpus linguistics is very important to establish relevance (frequency of use), still neglected by the VOLP: in it we do not find relevant terms such as "environment" (appears in federal laws, scientific works, journalistic, etc.) but we found archaic terms of no relevance as "half-gun". The focus of my question, however, is certifying (sorry I pointed that out just at the Notes/Contextualization): I need to certify which terms of a text are official and which are not.
– Peter Krauss
@Peterkrauss Ah, ok. : ) I think your question was actually quite clear. I just wondered if the corpus (this is the correct plural, right? sorry) linguistics could help in other types of need and if so would be worth the mention here. But since I’m not a real connoisseur of the subject, I just wanted to comment.
– Luiz Vieira
I tried to get in touch with ABL by the site and by e-mail without result. We still have to try by phone...
– Miguel Angelo
@Miguelangelo, I tried a long time ago, it doesn’t hurt to insist... Perhaps, even taking advantage of the fact that you are in Rio, it is best to inform yourself by phone, asking them to at least confirm (the conclusions we wrote here).
– Peter Krauss