I will speak here as people understand it. If you go to study type theory you will see that some things there are different from this.
There is some confusion in various terms on this subject. In some cases there is no universally accepted formal definition.
Popularly languages are classified by their typing, so static languages formally are language that have static typing. And dynamic languages have dynamic typing.
Some languages can be classified as dynamic because they have other characteristics. They allow arbitrary code execution (eval
) or the transformation of existing codes at runtime. It is increasingly common for modern languages to allow these flexibilities. No matter the typing, then formally there are languages that are dynamic by this definition and static by typing.
Definition
Static
The basic definition of static typing that a programming language can have as a feature is that there is a check of the types used in data and variables to ensure that a type is always being used that is expected in all situations. This verification is done in the source code by the build process. This analysis helps in calling type safety the use of the data by the program allowing the programmer worry less with this question. The compiler provides assurances that some problems may not occur after the program goes through this check, i.e., errors are detected right before the program is actually run.
A variable cannot change its type.
However, static typing can cause a false sense of security. Only part of the errors can be discovered in advance.
Example (is in C# but could well be pseudocode):
var x = 1;
x = "1"; //erro de compilação, não pode trocar o tipo da variável
Dynamics
In dynamic typing this check also occurs but it is done on top of the data itself, since the variables can contain any type of data. Of course, at a given time a variable can only contain one type of data and this is checked. But the main difference is that this check is done at runtime. This is done through an auxiliary infrastructure (a virtual machine or a library normally called Runtime). It is common for the programmer to have to do his own checks on the program or on external test codes to ensure that all types are correct at the right times. The programmer has to worry more with the types although in simple solutions it may seem that the concern is not necessary.
What is dynamic effectively is the type of the variable. Understanding and documenting the types are still necessary. There are languages that encourage you to do the documentation in code with hungarian notation.
Data needs to be represented concretely on the computer. Variable is a Pattern design (variable). So it’s an abstraction. Data cannot take many forms, at most it can be interpreted in different ways in specific cases. Variables can refer to different types. Therefore, dynamic typing is usually a special case of static typing. According to Bob Harper, one of the creators of the Standard ML language, "a dynamically typed language is a statically typed language with only a static type".
In practice, dynamic typing is an abstraction as well. There is an illusion that you can refer to a data that has different types but in practice there is only one marker and there is a point to a different static data.
The most common techniques to achieve this illusion is the use of a union type (union
in C/C++) and/or pointers with no type specification (void *
in C/C++). All data will have some overhead from memory.
Example (it is in Javascript but could be pseudocode, note that the syntax is identical to the previous example but the semantics is different)
var x = 1;
x = "1"; //operação normal, o tipo que era um number passa ser string
Performance
Another big difference is in performance. Although it is possible to perform optimizations in a program written with "dynamic language" to approximate or even exceed the performance of programs written in "static languages", due to the difficulty of achieving this is usually not done effectively. In theory a JIT compiler is better able to optimize code with accurate information of how the program will be executed and even in practice can even achieve some better result alone.
On the other hand if you already know what you are going to deal with, you do not need to have anything in the generated code helping the program to work and the program itself does not need to have certain checks written by the programmer. If the check is done before execution a program with static typing does not need to have this work during execution.
It also does not help the fact that most programs with dynamic typing will run on a virtual machine since this facilitates language development and allows some extra flexibilities normally desirable in "dynamic languages".
In practice languages are not usually 100% anything. Pragmatic tools know when to run away from a concept to give better advantage to the programmer.
Hybrid languages
There are hybrid languages. In practice it is not possible to have both forms of typing but it is possible to use part of one or the other concept.
Static language with dynamic types
When a language is considered static, formally it has static typing, it can have the infrastructure needed to store different data in one variable. But the data is stored in a different way than the other types. You’ll have a static type that can store data dynamically, but the language still has static typing at its core.
Roughly speaking, the language uses an extra data structure to store its type and indicate where the real data is. It is usually done in the library and usually has help from the compiler only to indicate that the normally done check should be relaxed since the check will be done by this library to some extent and also by the programmer to prevent certain errors will be generated in Runtime through this library.
Dynamic language with static types
Language with dynamic typing cannot effectively be partially static. After all, a dynamic language, formally that typing is dynamic, should always expect any type. If it starts to wait for a specific type and do the check before the execution, it turns into "static language". It is not possible to reduce the level of abstraction.
It’s even possible to provide a previous type check as an additional feature of the language but it doesn’t make much sense if it’s not accompanied by the change in the way the data is managed in memory. If the program needs to have its types fixed and guaranteed in advance it would be a waste to use a structure that allows it to have several types. This structure has memory cost (tag type, references to type in all situations*) and processing (extra indirect, data selection/specific method to be used).
It doesn’t make sense, but languages are doing this, almost all languages mainstream are adopting a form that approaches, other than, static typing.
Manifest typing
It is possible for a "dynamic language" to use manifest typing without changing its dynamic typing characteristic. The advantage is small or even questionable.
But there are also cases of languages that can compile parts of the code with static typing and parts with dynamic typing. The interface between the parts needs to be normalized for one way or another. Interestingly, some prefer to define themselves as static or dynamic to try to instill a predominant culture and use the other as an exception. Purists, for better or for worse, consider it a bad choice. In fact there are two very similar languages in this case and not just one.
*Some static languages have references to types when data will be stored in heap. And even in the stack the information will still exist in the code to allow reflection, but there will be no memory consumption in the stack to guard the guy.
Inference
Static typing does not mean that all types need to be declared explicitly. It is common, in most situations, that the compiler can infer the type of a variable according to its assignment or even its use (less common).
The great advantage of static typing is more in the fact that the variable cannot change its type. This really helps development a lot. Having to write the type explicitly helps little or nothing. Some find it more readable when the type is written, others think it is just the compiler’s deficiency.
So it is possible to have a "static language" that uses implicit typing by partially reducing the ceremony that these languages usually have.
PHP has the possibility to do a type check before execution but this helps little because the language is usually executed in an interpreted way and mainly because it cannot check all types, all situations, so you don’t have any security. Security is defined by the weakest link. During execution everything is dynamic.
I do not know the language Hack enough to tell how it works but it seems that it has static typing.
Eventually you can use the type mixed
which is that data structure that will allow any data to be stored there. It is likely that the compiler will treat this type differently. The program treats data dynamically by exception, the programmer says that everything can happen there because this is what he wants.
For several reasons it seems that Hack is the PHP that drinks in the water of C#, it seems to me that the mixed
behaves like the dynamic
of the C#.
Advantages and disadvantages
Often the advantages and disadvantages vary according to the eyes of the beholder. Programmers disagree on what is really an advantage, because almost everything is a matter of trade-off, so [it’s easier to talk about differences. Some of them may be:
Static typing |
Dynamic typing |
Find mistakes before |
Find errors at the last moment |
Code is usually more readable. Types appear in code or the IDE can show |
The programmer needs to understand and document the code to know the types and is never guaranteed |
Provides better semantics for code |
Typing works more like a mechanism |
Provides better profiling in most cases |
The performance is inferior |
Security |
Concision |
Any change in the problem requires a change in the code |
Changes can be assimilated without change in the specific code |
Development tools have more information and can help the programmer more |
It is difficult or impossible to create/use certain tools that help the programmer |
Object rules are well defined and fixed |
Rules vary with the state of the object |
Facilitates the formalization |
Facilitates the experiment |
Rigidity |
Flexibility |
Harder to learn and use |
More difficult to maintain |
Reduces the need for testing |
It requires tests on the guys |
Code reuse is more complicated |
Code reuse is riskier |
Metaprogramming is complicated |
Metaprogramming is easier |
Facilitates great programs |
Facilitates small programs |
Are considered tyrannical (fun) |
Are considered subversive (fun) |
The main advantages of "static languages" are the security guard, performance and development-time aid (refactoring, completion code, auxiliary information, code coverage etc.).
The "dynamic languages" are pliable, rapid to prototype, concise.
It is good to understand that the bugs more complicated remain equally complicated in both types.
Certainly there is an advantage in "dynamic languages" when thinking about the development of the language itself. Defining the language and creating a basic implementation is much simpler than a "static language". But to create a powerful implementation that can address some of its drawbacks, it becomes a difficult job. To this day no one has solved all the disadvantages.
Strong and weak typing
Sometimes these concepts are mistaken for strong and weak typing. This is partly explained by the fact that the strength of typing is not well defined and not universally accepted.
Strong typing is usually the characteristic that does not allow the same data to be treated as if it were of another type. It is very common for static languages to have strong typing. But there are exceptions.
This gives more robustness to the code.
C, for example, allows one data to be accessed/interpreted as if it were another. It can, for example:
- Record a
int
and access it as if it were a pointer.
- Record a
float
and access as if it were a int
. It is true that the result will be catastrophic in this case, but it is possible.
- Get a 0 and be considered false or other numbers (no matter what type) be interpreted as true in operations that require a boolean.
- Record two
short
in sequence and read as a int
. Probably nothing useful will be obtained but it is possible.
- Record "Sopt" and read this as if it were a
int
, I don’t know why.
C is a weak typing language. C++ too, although it tries to reinforce a style where this is not used as much.
C/C++ compilers try to prevent this from being misused.
Hence we conclude that type security is not an inherent feature of so-called static languages. Security can be broken by other factors. Type Safety is another different concept that can be mistakenly confused with strong typing and static typing.
Many "dynamic languages" have strong typing, but others have weak typing, usually creating implicit coercion. Coercion is common in some situations under defined rules. Example: "1" == 1
is true and 1 + "1"
gives "11"
.
Implicit coercion has the advantage of making code slightly shorter. This is usually characteristic of script, where code size makes a difference. So, languages made to develop applications, as opposed to scripts, should not have this characteristic (hello PHP).
The definition of these terms does not help much and to say that a language is weakly or strongly typed uniquely is also not usually true.
In general we can say something like that:
- Variable has type => typing static
- Variable without type => typing dynamics
- Value has type => typing strong
- Valueless type => typing weak
Languages of script
It becomes increasingly difficult to define languages of script. They are usually dynamic but nothing prevents them from being static.
Dynamic typing gives you more flexibility and helps you make small applications quickly. These languages usually require little ceremony and dynamic typing provides little ceremony in typing. But it’s more a matter of adequacy than a requirement.
Dynamic languages often perform more interpreted than compiled and is another indirect factor that helps to be used to do scripts. But it’s just a facilitator, again, it’s not a requirement.
No language wants to lose space so more and more static (typing) languages allow dynamic typing characteristics, reflection, concision, and execution in a simplified way (interpretation illusion). So they can be considered script also, although not its main focus.
In addition to the suitability, the script is more tied to the implementation than to the language itself.
A reflection
If a language allows its static types to have invalid values or values that do not match what is expected in this type, a null
For example, does it still have static typing? It seems so, but there is doubt whether they should. It breaks the security of types and in a certain way can be interpreted as the data can have two types. It requires a check on Runtime to ensure there will be no problems with the type.
Which is better?
The question you don’t want and will probably never shut up is: which is better?
It is obvious that there is not universally one better than the other. We can always talk the cliché that there is a better tool for a problem.
In practice I see that the choice falls, most of the time, in taste and experience. And this is good. I usually say that before choosing the best tool for the problem, choose something you know and feel comfortable in using: "the best tool is the one you know how to use". It’s the old feud between engineers who want everything to be perfect and administrators who want the best result to come. A chainsaw cuts wood faster and more accurately. A weekend joiner can get hurt with a chainsaw.
Of course there are cases that technically one is better than the other but this is becoming more subtle.
My personal observation is that programmers of "dynamic languages" tend to think less about the problem and create designs excessively simplified bringing future problems. But there are cases that a design better is not advantage.
On the other hand I see that programmers of "static languages" tend to think too much about problems and create designs excessively complicated without bringing many future advantages and not solving all future problems.
Note that this does not define the quality of the programmer and much less than 100% of the programmers are so in 100% of cases. There are cases where the opposite happens. And it should. When programming in "dynamic languages" the anticipated planning is usually more important.
I cannot say that the typing itself is responsible for the trend (if it is true). It may be the type of programmer that the typing attracts. It is the surfer going to the sea with waves and not the sea with waves making the subject become a surfer. Maybe people choose to type for the wrong reasons.
What I see happen a lot is that programmers who use "static languages" do not use the existing dynamic facilities in these languages unless there is no other way. And programmers of "dynamic languages" think that checking contracts should never be done within the code (in fact these languages do not usually provide facilities for this). I don’t know if it should be like this.
PDF with something more formalized to read. And a publishing well complete and apparently very well grounded.
See also: What is typing style?.
A comment on the performance issue: not always the optimizations in the static case will be better than in the dynamic case. An example is the JIT of the JVM, which reaches undo various optimizations that occurred in the build phase to make optimizations best in the [pre-]implementation phase. (fountain; may have bias) But I agree when you say "for the difficulty this is usually not done effectively".
– mgibsonbr
I improved it. I really like this article and it has inspired me a lot. The problem is that in practice it is very difficult to achieve everything it says. Interestingly it is easier to optimize on top of the same source code. Jitadas implementations of JS and Lua do over the source and achieve very reasonable results. As you understand the formalisms better than me, your answer ended up helping to improve my :)
– Maniero
Another complement: "[memory] structure allowing multiple types" is not the only one overhead dynamic typing - there is also the issue of the dispatch of methods (method Dispatch). If a function is overloaded for multiple types (i.e. polymorphic), a static analysis can determine the correct implementation to use and link the code still in the compilation phase. A dynamic [without optional static characteristics] would need at runtime to choose the correct method to be called according to the type of the variable value.
– mgibsonbr
In fact, I find this question more significant than the other: even static languages like Java still hold - along with the object - a reference to their class (this is what allows reflection). And at the end of the day, it’s overhead in representing of data that counts. It doesn’t make much difference if the variables spend a little more memory or less, because with the exception of deep recursive calls the memory spent by the code is small in relation to that spent by the data structures.
– mgibsonbr
Not to put too much into it, I agree with almost everything. I may have used another nomenclature, but this is it. I think this subject still gives p/ render a lot. Just yesterday I ended up explaining some things of this question to a person who would give another question :)
– Maniero
@Maniero, where did you see that C is weakly typed? I ask this because in some places it says it is weak typing and in others it says it is strong. This post, http://stackoverflow.com/questions/430182/is-c-strongly-typed for example, shows that this question divides opinions. You would have some more reliable material to confirm that doubt?
– Fagner Fonseca
@Fagnerfonseca people confuse the terms. It’s extremely common. Have you read all of the answer? I talk about it. In comment it is difficult to talk about it, I could answer in a question. In short, if one type of data can pass through another the language is weakly typed, and that is the q C most does.
char
is both a character and a number. Some types get confused with pointers, any crap you put in memory can be interpreted with a data of a kind. That’s weak typing. But variables are always the same type, that’s static typing.– Maniero