Are there objective advantages to a language being "case sensitive" or not?

Question

Are there objective advantages to a language being "case sensitive" or not?

Asked 10 years ago

Viewed 3,667 times

31

Or is this just taste?

I don’t want to if you like one more than the other. I don’t care why people like one or the other. I don’t want any bad answers. I don’t want to know historical reasons.

I want to know in a reasoned way what is gained or lost in each of the approaches.

Examples of sensitive languages: C, C++, C#, Java, Javascript, Python, Ruby, Objective C.

Examples of insensitive languages: SQL, COBOL, BASIC (I think all dialects), PHP (well, I don’t know, in part), and Clipper and dialects.

This question relates to that other.

Just to make it clear to those who don’t understand the subject, I’m talking about the syntax of the language. I’m talking about the keywords, the identifiers.

I keep trying to write an answer. Keep deciding is just opinion. I thought with the new status for the site, on-theme would become tighter.

– Bill Woodger

2015/07/16 at 09:25
I had not yet heard this double: "sensitive languages".

– viana

2016/12/17 at 20:00

6 answers

30

The main advantage of case sensitive is to increase the possible set of symbols (names). The main impact on traditional languages is the creation of an implicit relationship between a type and an instance of this type. Another less employed (Prolog, Erlang) is the possibility of giving differentiated semantic treatment depending on the capitalization. Finally, there are practical issues involving the compilation and management of the Runtime, in which the implementation of a language case sensitive is simpler.

The main advantage of case insensitive is the ease of memorization of language symbols, and the consequent reduction of errors related to incorrect capitalization.

Types vs. instances

In natural language, a name has the function of representing only one concept. There are collisions (e.g.: fruit sleeve or shirt sleeve), but they are rare and usually ambiguity is removed by context. In formal language (e.g.: controlled vocabulary, taxonomy, Thesaurus, ontology) the uniqueness of names is mandatory, and this applies especially to computer programs - where ambiguity would be a hindrance to the automated processing of text.

However, ambiguity is not always bad: in the presence of context, using the same name to represent different - but related - things not only causes little harm but is also useful. If I say "dog" this refers to a type of animal. If I say "a dog" I mean a specimen of that same animal. If I say "the dog" is not only a specimen of any of this animal, but a specific one, which may have already been mentioned before or is unique given the context (e.g., if there is only one dog in a set of animals). This facilitates communication.

In the same way in the programming one looks very for the concision, since it is a measure of the expressiveness of language. However, this concision must always be weighed against the clarity of the code ("programs are made for humans to read and only incidentally for computers to run"). In the absence of these semantic "shortcuts", one looks for other ways to make the name of something refer to a concept:

secrecy ($foo is a climb, @foo is an array);
Hungarian notation (iSize is a whole, szLastName is a null terminated string);
Naming conventions (IFoo is an interface, _foo is a private field).
etc..

Such conventions are in general implicit, but ultimately fulfill the role of assisting in the communication of code semantics to human readers. And a similar convention, quite similar to the case of the natural language mentioned above, is to use the same name to refer to a type and an instance of this type:

cachorro = new Cachorro();

In a language case insensitive, would need to use another name, and most likely the programmer would choose something like:

oCachorro = new Cachorro();
cachorro1 = new Cachorro();

Which makes the code less readable. And by the way, this tendency of programmers to use the same name for the class and for the object is evident when the class has the same name as a reserved word. See if this looks familiar:

clazz = new Class();

Choosing names is hard. Someone might insist "well, just give a different name to the variable, like rex = new Cachorro()", but in practice it makes understanding difficult. Even if the programmer sticks to a particular encoding style (class starts with uppercase, variable with lowercase, constant is all uppercase) he loses this implicit semantic association of type with instance. Or at least is forced to adopt a different convention, as the examples above.

Differentiated semantics

In most languages any difference between capitalized symbols or not is purely conventional. But nothing prevents - and I am of the opinion that it would be a great advance - that the compiler "force" this convention (as suggested in answer by Victor Stafusa). This not only creates consistency but can increase expressiveness, as happens for example in the Prolog language.

Prolog (and other languages inspired by it, such as Erlang) distinguishes between upper and lower case and - in the specific case of first letter of the name - gives a differentiated semantic treatment for the same: foo is an "atom" (constant) and Foo is a variable. This seems a silly detail, but in my experience with this language the gain in expressiveness is huge.

By way of example, some modern languages have a feature called destructuring assignment/bind, such as Python:

for chave,valor in dicionario.items():

If Python supported unification like Prolog, it would be possible to do at the same time a destructuring bind, a type check and even a filtering in a single expression:

# Assumindo uma estrutura de dados Cachorro(nome, raca, idade)

for chave,Cachorro(nome, MALTES, sqrt(4)) in dicionario.items():
    print(nome)

# que seria equivalente a:

for chave,valor in dicionario.items():
    if isinstance(valor, Cachorro):
        nome = valor.nome
        if valor.raca == MALTES: # Assumindo que MALTES é uma constante global
            if valor.idade == sqrt(4):
                print(nome)

If language were case insensitive, on the other hand, this unification would not be possible. As the compiler would know that:

Not to call Cachorro(...) as a function, but sqrt(...) yes;
nome is a variable, which must receive as value the first field of the instance being iterated;
Already MALTES is a constant, which must be compared to the second field of the instance being iterated?

Note that all this could be done as secrecy, Quotes (ex.: Lisp), etc., but notice how the code is "clean" without the visual pollution of the special symbols everywhere.

(P.S. One more example "Relatable" is the case for regular expressions: \w takes a character class, \W denies this class. If regexes were case insensitive would have to "spend" one more symbol, making the language a little less dense.)

Memorization and incorrect capitalization

It is hard enough to memorize the names of all the classes, functions, etc of an API, so as to be productive in it without having to go back to a reference all the time. If the use of capitalisation is not much consistent, you end up having to memorize it too, and this is not something to be underestimated: our brains are good for memorizing concepts (and the Orientals are even better than the Westerners), but not so much for memorizing symbols and spellings:

If you try to memorize "housewife", your brain will probably store it as a sequence of sounds (ˈdɔ nɑ dɪ ˈkɑ sə).
An oriental could memorize this as an image, which is even easier (婦).
A programmer would also have to worry about how to write it:
- dona de casa? No, as identifiers cannot have more than one word;
- dona-de-casa? No, yeah - is an operator;
- dona_de_casa? Could be... or would be donaDeCasa?
- DonaDeCasa? DONA_DE_CASA? Let’s see, I’m dealing with a variable, class or constant?

If a language follows a strict capitalization convention, and especially if the compiler enforces these rules (as already addressed in Victor’s answer) then the problem is not so great. But when you start using acronyms it gets more complicated:

IDCachorro or IdCachorro?
UTF8Regex, Utf8Regex, UTF8RegEx or Utf8RegEx?

Unless the compiler knows what a ID, that there is something called UTF8, that regex means REGular EXpression, etc, nothing prevents a programmer from defining something perfectly valid within the coding style of a language, and another programmer does not know how to spell it...

The frustration with this situation has been one of the most presented arguments in favor of the case-insensitive. For consistency, some people defend case-Preserving, case-insensitive, which in my view means "no matter the capitalization, but after a symbol is first declared requires it to always be spelled the same way". Personally, I see it as the worst of both worlds: all the advantages of case sensitive, but it still forces the programmer to memorize the spelling of the name... (or would be my interpretation of this incorrect concept?)

In any case, there is at least one unequivocal advantage in case insensitive: whether the API has defined RegEx and the programmer wrote Regex, He’s not going to treat both of you as separate things. If the language requires every variable/type/etc to be declared, and does not allow it to be declared more than once, there is no problem (otherwise it would simply be exchanging false negatives for false positives). The learning process of language and its Apis is facilitated, in exchange for a reduction in the set of usable names.

Practical issues

Finally, there are the practical implications of adopting one or the other way. A good number of them have already been described both in Victor’s reply and in my answer to the related question: the major effort on the part of the compiler/interpreter to normalize the names (in relation to Unicode), convert them to a single form and perform the interning. But more important than this complexity in implementing (which by the way is the function of the computer itself, making the lives of humans easier, even if the compiler’s designer has to work harder for this) is the question of location, where the same programme may be interpreted differently if changes in capitalisation are an integral part of its compilation process.

By way of example, in a language case insensitive as a compiler in the Turkish locale would treat names like MAİL and maıl? And how would an "international" compiler treat the same? The expectation of the Turkish programmer would be met or frustrated by any of the compilers, and what? As a user of systems that don’t always take good care of Unicode, I know how these details can piss you off and make you angry. If I had to worry about them also when developing (in which my attention is all "spent" with the problem at hand) I think I would eventually reject a language that does not treat it very well...

Completion

The tradeoffs seems to me to be basically the following:

Easy to remember (case insensitive) vs. semantic expressiveness (case sensitive);
False positives (case insensitive) vs. false negatives (case sensitive) in determining whether two names refer to the same thing.

And the other characteristics of language have an influence on these factors, sometimes improving on one side and worsening on the other. Ex.: if Python became case insensitive would increase the number of collisions of variable names, and if to counterbalance this the variable declaration became mandatory (ex.: var RegEx) would reduce the concision.

1

That’s what I’ve been waiting for. Excellent.

– Maniero

2015/07/18 at 15:24
1

Nice but I got confused in one part. You say: "dog = new Dog(); In a case insensitive language, it would be necessary to use another name", but VB.Net, case insensitive, accepts Dim cachorro As Cachorro = New Cachorro(), as well as accepted Dim Cachorro As Cachorro = New Cachorro(), and Java and C#, both case sensitive, also accept Cachorro Cachorro = new Cachorro();. These languages know that it is a variable name by context and do not collide the variable name with the class name, even if the names are identical or varying only in Letter case.

– Caffé

2015/07/20 at 17:06
I remembered now a popular language in Brazil that if I remember well (I haven’t used it for a long time) does not allow declaring the variable with the same name as the class - Delphi. There this is not a problem because there is the pattern of declaring classes prefixing it with "T" of "type" (which I find very annoying). In any case, it seems to me that this is a limitation of the compiler and not a natural lack of case insensitive languages.

– Caffé

2015/07/20 at 17:39
@Caffé In fact, in a statically typed language - in which the namespace of the variables/objects and the namespace types are disjoint - it’s okay to have a type and a variable with the same name. Similarly, it has languages in which a variable can have the same name as a keyword, provided it is clear by the syntax what is what. Only when types are first class members does the conflict appear (i.e. the language allows you to store a type in a variable? The same type, an object of the type Class doesn’t count...).

– mgibsonbr

2015/07/21 at 09:43
Anyway, the basic premise remains valid: if in VB.Net you can have type Cachorro and variable cachorro, in Java you can have types Cachorro and cachorro and variables cachorro and Cachorro. The universe of possible names is drastically reduced when adopting the case insensitive. It should be noted however that this is not necessarily a bad thing.

– mgibsonbr

2015/07/21 at 09:43

Browser other questions tagged encoding-style language-design case-insensitive cash-sensitivity

You are not signed in. Login or sign up in order to post.

by Victor Stafusa • **63,338** points · Answer 1 · 2015-07-13T20:48:58+00:00

An advantage of language being case sensitive is that it is easier to enforce code naming rules. Despite this, none of the languages you cited are case sensitive ends up forcing rules of nomenclature, and I don’t know any that ever saw the light of day that make.

An example, in a language case sensitive that would force similar naming rules to Java (just as an example), if I tried to declare a variable with a name beginning with uppercase, I would get a build error. If I tried to declare a class with a name starting with lowercase as well. However, the fact that these languages nay forcing the rules of nomenclature, makes them lose this advantage, as I can still declare the name of a class with lowercase letter or the name of a variable with uppercase letter, disobeying the conventions of the language.

On the other hand the disadvantages are many. It becomes more difficult to learn languages case sensitive, because it is difficult for beginners to get used to the idea that fileName, FileName, filename and FILENAME are different things. Even those who are more experienced sometimes end up making mistakes of changing some uppercase or lowercase that the compiler will pick up... if the language is compiled!

If the language is interpreted, case sensistive and allow implicit declaration of variables, such as javascript, if you do not want to use resultARray instead of resultArray, you will have a headache and will waste a lot of time with debugging. This is because of course the languages case sensitive allow identifiers to exist in the same scope differing only by upper/lower case so that resultARray and resultArray are different variables. But worse than that, it’s when you take that whole twisted code written by some idiot that uses the variables in the same scope xy, Xy, XY and xY for completely different purposes and purposes.

It is true that language is case sensitive inhibits the programmer from writing code inconsistency and paying attention to upper and lower case, which should make the code look more uniform. However, bad programmers will always find a way to encode identifiers with language-inconsistent nomenclature case sensitive and good programmers will always find a way to encode identifiers with consistent nomenclature even in languages case insensitive.

So let’s summarize what’s good:

Force a nomenclature rule that leaves the uniform code. Languages case insensitive neither try it and fail. Languages case sensitive try, but fail the same way.
Identifiers that differ only in upper/lower case shall match the same identifier. - Point for the case insensitive.
Identifiers that differ only in upper/lower case should be prohibited. - Point to case insensitive.

So we have on the scoreboard two points in favor of the case insensitive and zero for the case sensitive (or half a point, if you want to consider that the frustrated attempt is better than nothing).

Then the case insensitive is better? Yes, it is better than the case insensitive, because it’s less confusing, less prone to mistakes and more natural. But that doesn’t mean there can’t be something even better: It would be possible to create a programming language that achieves three points?

Yes, if the programming language aims to ensure the consistency of identifiers without creating confusion. For this, it would have to be case sensitive in the analysis of the source code, but case insensitive in the acceptance of this. For example, if the language dictates that variable names should be written with lowercase only, then declare a variable with the name Minha_Variavel instead of minha_variavel should cause a build error (or at least a Warning). On the other hand, if I declared minha_variavel = Minha_Variavel * 2, the compiler would understand that Minha_Variavel it is only a matter of minha_variavel misspelled, giving a build error or Warning But he wouldn’t think it was another variable. A programming language like this would get a score of 3 and surpass both the purely case sensitive as to the purely case insensitive. But I don’t know any programming language like this, the closest is Ides of languages case insensitive which automatically convert to the default form as the programmer types (as with Visual Basic).

There is also a disadvantage that is undertargeted in languages case insensitive: Unicode characters outside the ANSI pattern. An interesting example is Turkish, where the uppercase form of i is İ while the tiny form of I is ı. Thus, if a language admitting international characters in identifiers is case sensitive, when these characters are used may end up being unclear which different identifiers are equivalent to each other for those who do not know the alphabet in question (another example, few people would see ΓΩξνσς as equivalent to γωΞΝΣΣ). For alphabets like Chinese, which do not have the concept of upper/lower case, this distinction is even worse, since they are letters that are neither upper and lower case. However, few programming languages accept identifiers with names using Unicode characters, and in general, even in those that allow them to be used, it turns out to be a bad programming practice. Because of this, at the end of the day, this disadvantage ends up either weighing very little or being irrelevant in practice.

by zentrunix • **5,511** points · Answer 2 · 2016-12-17T19:10:27+00:00

In languages that require declaration of variables before their use the general question is irrelevant.

In languages that do not require declaration of variables before their use (typically scripting languages) being case-sensitive is a shot in the foot because an error in typing variable names is not a syntactic error, so it is not an error by the compiler/precompiler/interpreter/etc.

by Miro Eduardo • 1 point · Answer 3 · 2016-12-15T08:50:32+00:00

I’m really going by definition. Case-sensitive is an anglicism that refers to a type of typographical analysis of informatics. In Portuguese, it means something like "letter box sensitive" or "case sensitive". It is said that a software is case-sensitive or has "case sensitivity" when it is able to analyze a string, evaluate the existence of high and low box and behave in different ways depending on it.

Case-sensitive means that high and low-case characters are treated differently. For example, the words sum and SUM are considered different

by Jonathan • **145** points · Answer 4 · 2016-12-15T10:09:36+00:00

Basically

Zoom in on the number of Symbols you can use
Standardizes the code
Facilitates reading

by Ivan Ferrer • **12,096** points · Answer 5 · 2015-07-13T18:59:20+00:00

Besides perfumery, of course (there is an attempt to standardize the code, which should, does not happen very often)... There are some significant advantages, for example in encodings such as base64_encode and base64_decode, data encryption, safer password creation, rash usage and unlimited combinations. Indexing of SEO pages. Behavioral change between similar methods. Indexing things from the same world, for example, you may have a class called Animal and a constant called ANIMAL. But nothing to prevent it from being discarded, since a good part despises its importance...