The main advantage of case sensitive is to increase the possible set of symbols (names). The main impact on traditional languages is the creation of an implicit relationship between a type and an instance of this type. Another less employed (Prolog, Erlang) is the possibility of giving differentiated semantic treatment depending on the capitalization. Finally, there are practical issues involving the compilation and management of the Runtime, in which the implementation of a language case sensitive is simpler.
The main advantage of case insensitive is the ease of memorization of language symbols, and the consequent reduction of errors related to incorrect capitalization.
Types vs. instances
In natural language, a name has the function of representing only one concept. There are collisions (e.g.: fruit sleeve or shirt sleeve), but they are rare and usually ambiguity is removed by context. In formal language (e.g.: controlled vocabulary, taxonomy, Thesaurus, ontology) the uniqueness of names is mandatory, and this applies especially to computer programs - where ambiguity would be a hindrance to the automated processing of text.
However, ambiguity is not always bad: in the presence of context, using the same name to represent different - but related - things not only causes little harm but is also useful. If I say "dog" this refers to a type of animal. If I say "a dog" I mean a specimen of that same animal. If I say "the dog" is not only a specimen of any of this animal, but a specific one, which may have already been mentioned before or is unique given the context (e.g., if there is only one dog in a set of animals). This facilitates communication.
In the same way in the programming one looks very for the concision, since it is a measure of the expressiveness of language. However, this concision must always be weighed against the clarity of the code ("programs are made for humans to read and only incidentally for computers to run"). In the absence of these semantic "shortcuts", one looks for other ways to make the name of something refer to a concept:
- secrecy (
$foo
is a climb, @foo
is an array);
- Hungarian notation (
iSize
is a whole, szLastName
is a null terminated string);
- Naming conventions (
IFoo
is an interface, _foo
is a private field).
- etc..
Such conventions are in general implicit, but ultimately fulfill the role of assisting in the communication of code semantics to human readers. And a similar convention, quite similar to the case of the natural language mentioned above, is to use the same name to refer to a type and an instance of this type:
cachorro = new Cachorro();
In a language case insensitive, would need to use another name, and most likely the programmer would choose something like:
oCachorro = new Cachorro();
cachorro1 = new Cachorro();
Which makes the code less readable. And by the way, this tendency of programmers to use the same name for the class and for the object is evident when the class has the same name as a reserved word. See if this looks familiar:
clazz = new Class();
Choosing names is hard. Someone might insist "well, just give a different name to the variable, like rex = new Cachorro()
", but in practice it makes understanding difficult. Even if the programmer sticks to a particular encoding style (class starts with uppercase, variable with lowercase, constant is all uppercase) he loses this implicit semantic association of type with instance. Or at least is forced to adopt a different convention, as the examples above.
Differentiated semantics
In most languages any difference between capitalized symbols or not is purely conventional. But nothing prevents - and I am of the opinion that it would be a great advance - that the compiler "force" this convention (as suggested in answer by Victor Stafusa). This not only creates consistency but can increase expressiveness, as happens for example in the Prolog language.
Prolog (and other languages inspired by it, such as Erlang) distinguishes between upper and lower case and - in the specific case of first letter of the name - gives a differentiated semantic treatment for the same: foo
is an "atom" (constant) and Foo
is a variable. This seems a silly detail, but in my experience with this language the gain in expressiveness is huge.
By way of example, some modern languages have a feature called destructuring assignment/bind, such as Python:
for chave,valor in dicionario.items():
If Python supported unification like Prolog, it would be possible to do at the same time a destructuring bind, a type check and even a filtering in a single expression:
# Assumindo uma estrutura de dados Cachorro(nome, raca, idade)
for chave,Cachorro(nome, MALTES, sqrt(4)) in dicionario.items():
print(nome)
# que seria equivalente a:
for chave,valor in dicionario.items():
if isinstance(valor, Cachorro):
nome = valor.nome
if valor.raca == MALTES: # Assumindo que MALTES é uma constante global
if valor.idade == sqrt(4):
print(nome)
If language were case insensitive, on the other hand, this unification would not be possible. As the compiler would know that:
- Not to call
Cachorro(...)
as a function, but sqrt(...)
yes;
nome
is a variable, which must receive as value the first field of the instance being iterated;
- Already
MALTES
is a constant, which must be compared to the second field of the instance being iterated?
Note that all this could be done as secrecy, Quotes (ex.: Lisp), etc., but notice how the code is "clean" without the visual pollution of the special symbols everywhere.
(P.S. One more example "Relatable" is the case for regular expressions: \w
takes a character class, \W
denies this class. If regexes were case insensitive would have to "spend" one more symbol, making the language a little less dense.)
Memorization and incorrect capitalization
It is hard enough to memorize the names of all the classes, functions, etc of an API, so as to be productive in it without having to go back to a reference all the time. If the use of capitalisation is not much consistent, you end up having to memorize it too, and this is not something to be underestimated: our brains are good for memorizing concepts (and the Orientals are even better than the Westerners), but not so much for memorizing symbols and spellings:
- If you try to memorize "housewife", your brain will probably store it as a sequence of sounds (ˈdɔ nɑ dɪ ˈkɑ sə).
- An oriental could memorize this as an image, which is even easier (婦).
- A programmer would also have to worry about how to write it:
dona de casa
? No, as identifiers cannot have more than one word;
dona-de-casa
? No, yeah -
is an operator;
dona_de_casa
? Could be... or would be donaDeCasa
?
DonaDeCasa
? DONA_DE_CASA
? Let’s see, I’m dealing with a variable, class or constant?
If a language follows a strict capitalization convention, and especially if the compiler enforces these rules (as already addressed in Victor’s answer) then the problem is not so great. But when you start using acronyms it gets more complicated:
IDCachorro
or IdCachorro
?
UTF8Regex
, Utf8Regex
, UTF8RegEx
or Utf8RegEx
?
Unless the compiler knows what a ID
, that there is something called UTF8
, that regex
means REGular EXpression
, etc, nothing prevents a programmer from defining something perfectly valid within the coding style of a language, and another programmer does not know how to spell it...
The frustration with this situation has been one of the most presented arguments in favor of the case-insensitive. For consistency, some people defend case-Preserving, case-insensitive, which in my view means "no matter the capitalization, but after a symbol is first declared requires it to always be spelled the same way". Personally, I see it as the worst of both worlds: all the advantages of case sensitive, but it still forces the programmer to memorize the spelling of the name... (or would be my interpretation of this incorrect concept?)
In any case, there is at least one unequivocal advantage in case insensitive: whether the API has defined RegEx
and the programmer wrote Regex
, He’s not going to treat both of you as separate things. If the language requires every variable/type/etc to be declared, and does not allow it to be declared more than once, there is no problem (otherwise it would simply be exchanging false negatives for false positives). The learning process of language and its Apis is facilitated, in exchange for a reduction in the set of usable names.
Practical issues
Finally, there are the practical implications of adopting one or the other way. A good number of them have already been described both in Victor’s reply and in my answer to the related question: the major effort on the part of the compiler/interpreter to normalize the names (in relation to Unicode), convert them to a single form and perform the interning. But more important than this complexity in implementing (which by the way is the function of the computer itself, making the lives of humans easier, even if the compiler’s designer has to work harder for this) is the question of location, where the same programme may be interpreted differently if changes in capitalisation are an integral part of its compilation process.
By way of example, in a language case insensitive as a compiler in the Turkish locale would treat names like MAİL
and maıl
? And how would an "international" compiler treat the same? The expectation of the Turkish programmer would be met or frustrated by any of the compilers, and what? As a user of systems that don’t always take good care of Unicode, I know how these details can piss you off and make you angry. If I had to worry about them also when developing (in which my attention is all "spent" with the problem at hand) I think I would eventually reject a language that does not treat it very well...
Completion
The tradeoffs seems to me to be basically the following:
- Easy to remember (case insensitive) vs. semantic expressiveness (case sensitive);
- False positives (case insensitive) vs. false negatives (case sensitive) in determining whether two names refer to the same thing.
And the other characteristics of language have an influence on these factors, sometimes improving on one side and worsening on the other. Ex.: if Python became case insensitive would increase the number of collisions of variable names, and if to counterbalance this the variable declaration became mandatory (ex.: var RegEx
) would reduce the concision.
I keep trying to write an answer. Keep deciding is just opinion. I thought with the new status for the site, on-theme would become tighter.
– Bill Woodger
I had not yet heard this double: "sensitive languages".
– viana