What is the difference between syntactic error and semantic error?

Asked

Viewed 56,754 times

45

The concepts of syntactic and semantic error are found in books and other programming materials. However, always generate doubts for those who are starting.

In a practical way. What is the difference between syntactic error and semantic error?

5 answers

43


Just as in natural language, programming languages are expected to arrange the various symbols in a logical way in relation to each other, just as words come together to form expressions, prayers, phrases. This characteristic is the syntax of language. A syntactic error is therefore a case where the "sentences" of the program (instructions, expressions) are misspelled, what we commonly call "grammatical error".

Examples:

  • Parentheses that open but do not close;
  • Two numbers next to each other with no operator between them;
  • Two instructions without a semicolon between them;
  • A keyword being used in an unexpected position.

There is an even more basic type of error, which is when the symbol itself is poorly formed (e.g., a number with letters in the middle - 123y4), that it would be a "lexical" error or, as we normally say, a "misspelling". This type of error can be grouped together with syntactic errors, unless we want to be very purist/pedantic.

Already the semantics refers to the meaning of what is meant. In the same way that a natural language phrase may be grammatically correct but not make any sense, also the instructions given to the computer may be well formatted but not do what the programmer wants - or even nothing useful or still possible.

Examples:

  • Split a number by a string;
  • Create a class that inherits from itself;
  • Use the operator ^ thinking that it is exponentiation, but actually is one or exclusive;
  • Divide zero by zero.

Syntax errors are always detected at compile time/parse (because if the compiler can’t even assemble a sentence, he can’t do anything else with it). Semantics, on the other hand, can also be caught during compilation (whether in type analysis, code generation, or other phases) but they may not be either, and end up causing an error at runtime - or at least an incorrect result/behavior. That is, the compiler successfully analyzes the source code, generates machine code, executes it, and the problem only manifests itself during this execution (whether or not it is immediately noticed).

22

[I know this is an old question, but I am not satisfied with any of the answers.]

Compilers of most programming languages are divided into several steps:

  • Lexical analysis;

  • Syntactic analysis;

  • Semantic analysis and

  • Code generation.

Depending on the compiler, there may be more than one code generation step or optimization steps along with code generation. Some languages (e.g., Javascript, Ruby) have virtually no semantic analysis. In some programming languages, lexical and syntactic analysis are not separable.

Also, as well noted in comment from Jefferson Quesado, "usually, the compiler pipeline does not wait for the lexical analysis to complete itself before it starts syntactic, just as the output of the parsing can already be consumed immediately by the semantic part and so on."

Therefore, a lexical error is an error that occurs in the lexical analysis step. A syntactic error is an error that occurs in the syntactic analysis. A semantic error is one that occurs in semantic analysis. And a logic error is an error that occurs at runtime (not compilation). There are also cases where errors are detected in code generation, but these are significantly rare.

Let’s take for example, the Java language, which have all these steps well separated and know is a language that the author of the question knows.

Lexical error

An example of a program with some lexical errors would be this:

public class Teste {
    public static void main(String[] args) {
        int x = 25 + 0x; #
        "abc
    }
} /*

The lexical analysis divides the program into tokens, which is more or less like words. For example, the public is a token, the class is a token, the { is a token, the ; is a token. Lexical analysis also makes a quick classification of tokens as to their type, so the lexical analyzer knows that class is a key word, which Teste is an identifier, which [ is a special symbol and that 25 is an entire literal. Lexical Parser is also responsible for ignoring comments.

See that 0x? It is a lexical error because the lexical parser will not be able to turn it into a valid whole literal. That # is also a lexical error because in the case of #, it corresponds to something that the lexical analyzer is not able to identify what it is. The "abc is a string that does not end, and therefore the lexical parser will not be able to provide a well-formed token over it, another lexical error. The comment at the end that starts and does not end is also a lexical error.

Syntactic error

The syntactic analysis is responsible for recognizing the general structure of the program. It is at this stage that the compiler recognizes where are the instructions, expressions, classes, methods, functions, parameters, procedures, etc. However, this step is not responsible for assessing whether this structure makes sense (this is the semantic analysis)she is responsible only for recognizing the structure.

An example of program with syntactic errors would be this:

public class Teste
    public static void main(String[[] args) {
        int a = 10
        int x = * 25;
    }

    public void void metodoA() {}

    public @void metodoB() {}

    public void metodoC( {

    public int metodoD(int interface) {}

    public int metodoE() {
        double x = new for while;
        do { System.out.println("X"); };
    }
} (

Here we have these mistakes:

  • Missed the { in the class declaration.

  • In the method main, there is a [ the most in the parameters declaration.

  • Missing a semicolon after the int a = 10, and without it the compiler cannot know where an instruction ends and the next instruction begins.

  • The * in the int x = * 25; causes the compiler to fail to assemble a well-formed multiplication expression because the left operand is missing.

  • The type of return void void of metodoA will make the compiler confused because when he sees the second void, he would be waiting for the name of the method.

  • The @ in the statement of metodoB will also confuse the compiler.

  • In the method metodoC, note that you have a ( that does not close and a { that doesn’t close.

  • In the method metodoD, the word interface is a keyword, not an identifier. The compiler expected to see an identifier instead of the parameter name.

  • The new for while will leave the compiler very confused, after all the for and the while should not appear there.

  • The do { System.out.println("X"); }; is ill-formed because the while after the }.

  • There’s a ( left lost in the end.

With these syntactic errors, the compiler cannot understand what the code structure is. He will not be able to separate the instructions from each other, the methods from each other, where the class starts or ends, etc. If he does not understand the structure of the code, nor does it make sense to try to assess whether that structure is correct or not. In the case of Java and some other programming languages, semantic analysis will not even start if there are syntactic errors.

Semantic error

Once the compiler knows the structure of the code, knows where the instructions, parameters, classes, methods, instructions and how slogans are organized, then we have the step where the compiler will analyze whether this structure makes sense.

An example of program with semantic errors would be this:

public class Teste extends Teste {
    public static void main(String[] args) {
        int x = "abc" * true;
        int m = 5 * (new Teste(4) + x()) - "z";
        int h = 5.6;
    }

    public static void main(String[] args) {}

    private void x() {
        int[] b = new String[] {"Oi"};
        NaoAchei.naoSei();
        g = 4;
        Runnable p = new Runnable();
        Runnable q = new Runnable() {
            @Override
            protected void run() {}
        };
        q.foo();
        Teste.x();
    }

    class String2 extends String {}

    class Carro implements Runnable {
        @Override
        public void what() {}
    }

    class Carro2 extends Runnable {}

    void y(int a, int a) {
        int x;
        int m = 25 * x;
    }
}

The compiler can understand what the structure/skeleton of this code is. He’s able to see the class and know where it starts and where it ends, he can identify the methods he has, he can separate the instructions from each other. You can know where the additions, multiplications and subtractions are. That is, there are no syntactic errors.

However, although the program is structurally well-formed, it is still poorly formed semantically. Here are the errors:

  • A class cannot inherit from itself.

  • There are two methods main with the same signature.

  • You can’t multiply one String by a boolean.

  • There is no builder of Teste who receives int as a parameter.

  • It is not possible to add something to x() because the result of this method is void.

  • It is not possible to subtract a String of something else.

  • The number 5.6 cannot be assigned to an integer.

  • In the method x, the variable b should receive an array of int, not an array of String.

  • In instruction NaoAchei.naoSei();, there is no class or variable called NaoAchei, so you can’t call any method in it.

  • In instruction g = 4;, the variable g has not been declared anywhere, so it is not known where this value should be stored.

  • In assigning the variable p, an attempt is made to instantiate an interface.

  • In the variable q, the method run() of Runnable is public, but one tries to overwrite it as protected. One cannot weaken the level of access.

  • In instruction q.foo(); we have to q is the type Runnable. But Runnable has no method called foo.

  • In instruction Teste.x();, occurs that the method x() is not static, so it cannot be invoked that way without an instance.

  • The class String is final, soon not enough for the class String2 inherit from her.

  • The class Carro implements Runnable but does not overwrite the method run() specified in the interface. This class is also not abstract.

  • The class Carro writes about the method what(). But none of your superclasses or interfaces have this method to overwrite.

  • The class Carro2 is tending to inherit from an interface (with extends) instead of implementing it (with implements).

  • The method y has two parameters with the same name.

  • In instruction int m = 25 * x; of the method y, the variable x was not initialized.

With these errors, although the compiler can understand what the code structure is, he is not able to understand what this structure is trying to do, or realizes that it is violating language rules, or realizes that the code is poorly formed even though it has a well-structuredgraduate.

Logic error

Even if the compiler does not detect errors in the program and the Compile, this does not mean that it is correct. We may still have logic errors that manifest only during the execution of the program. For example:

public class Teste {
    // Calcula todos os números primos de 0 a 100.
    public static void main(String[] args) {
        String x = null;
        System.out.println(x.trim());
    }
}

This one’s gonna make a NullPointerException on execution. Note that the code is syntactically well-formed. In semantic analysis, the compiler can see that the variable x receives a valid value for the type String. He also sees that the method trim() exists in type String. He sees that class System there is, which has a field called out, which is static and public and which kind of field (java.io.PrintStream) offers the method println invoked. However, this does not mean that the program will be run without any problem and nor that what the program does is what the programmer wanted it to do (note the comment).

Semantic errors vs logic errors

In general, the idea of semantic analysis is to capture semantic errors in the code. That is, identifying errors of logic and things that would be impossible to work out during execution. However, there are many types of problems that cannot be verified by the compiler (the compiler has no way of knowing if the code corresponds to what the programmer wanted), and therefore it is impossible for the semantic analysis to be complete. Despite this, a semantic analysis even partial or incomplete can identify several types of errors. In fact, the parade problem is a classical computation problem that shows that a full semantic analysis is impossible/undecidable/unsolvable.

One could say that errors of logic are semantic errors, since the purpose of semantic analysis is to identify things that would be errors of logic in the given program, and therefore by this reasoning, semantic error would be a synonym for error of logic. However, it is often useful to separate the errors that the compiler identifies after the parsing has been completed from the errors that only manifest at runtime. Thus, the compilation errors produced by the compiler after completion of the parsing are often referred to as semantic errors, whereas the logic errors are those that manifest during the execution and are not captured by the compiler.

Often interpreted languages have little or no semantic analysis. This is the case with Javascript, PHP, Perl, Python and Ruby for example. When you run a code snippet in these languages, it will do a compilation internally to perform the syntactic analysis and being this successful, it will generate some internal data structure representing the program (code generation) and already start the execution of it. The fact that there is no semantic analysis stage, means that these languages will be much more flexible in relation to which programs they accept to run and in relation to which the programmer can do, but will offer much less guarantees to the programmer in certifyingif the given program is well-formed. With this, many errors that could have been detected at compile time will manifest only at runtime. On the other hand, since semantic analysis cannot be complete, it is inevitable that many logic errors are not captured in semantic analysis.

Syntactic errors vs semantic errors

Lexical analysis occurs as part of the syntactic analysis, and therefore it is frequent that lexical errors are also classified as a special type of syntactic error. In addition, the occurrence of a lexical error often prevents the compiler or interpreter from understanding the program structure.

Finally, syntactic errors are structural errors in the program that prevent the compiler or interpreter from understanding the program structure. Semantic errors are errors that manifest themselves in structurally well-formed programs. Consider as semantic errors only those that the compiler captures after parsing or as any error that happens after, even during execution, at your discretion.

Other types of errors

Finally, there are still some types of errors that do not fall into any of these categories. Would be code generation errors (for example, the compiler could not generate the executable because the disk space ran out or because the code to be compiled is too big) and compiler problems and bugs (when the compiler enters infinite loop, or when an internal error occurs in the compiler, or when there is not enough memory to run the compiler).

There are also cases where the program does what the programmer expects it to do, but the user uses it inappropriately and ends up producing results that are not what the user wants, and in this case, we already enter concepts such as usability and definition of requirements.

Another case is if the program does what the programmer and the user want it to do, but it interacts with another program or service and that other program or service behaves unexpectedly, which would then be an integration problem.

  • 1

    Just one detail for the curious reader who gets to that point: usually, the pipeline compiler does not wait for the lexical analysis to complete itself whole before starting syntactic, just as the output of the parsing can already be consumed immediately by the semantic part and so on.

  • 1

    @Jeffersonquesado Yes, indeed.

  • Victor, I believe that "abc" * true is a syntactic error (in Java at least) because the multiplication operation accepts numbers and identifiers; String a = "abc"; boolean marmota = true; int m = a * marmota; I believe it’s the semantic error you were after. Another semantic error is int x; int m = 25 * x;, operation with uninitialized variable

  • 3

    @Jeffersonquesado "abc" * true is a semantic error: https://ideone.com/DKtXYV

  • 1

    @Jeffersonquesado I edited the reply to incorporate your suggestions. Thank you.

  • 1

    ok, you’re right :) It’s just that when I wrote the compilers for Java (for studies, of course, unprofessional) I turned it into syntactic analysis because I could =]

Show 1 more comment

12

A syntactic error is when some element of that statement is out of place, be it the lack of a line terminator, an operator in an unexpected place, etc.

Semantic errors can happen from the point of view of the machine (less chance) and the programmer (greater chance).

Semantic errors anchor on the machine when it does not have enough information to process an instruction (infer the result) even if it is syntactically correct, a common example is in SQL when doing Join between tables who have fields with the same names.

Ex:

SELECT * FROM produtos INNER JOIN vendas ON id = id // esse id é vendas ou produtos?
SELECT * FROM produtos p INNER JOIN vendas v ON p.id = v.id // correto

Error:

ambiguous column reference

Who most comment interpretion errors is the programmer when working with a complicated code (or not) an example is index access that does not exist in an array, maybe the programmer has confused that are 3 elements of the array that goes from zero to two and not from one to three

Ex:

 for(int i=0; i<=array.length; i++){
     print array[i]; //array[3] não existe, só deve ir até dois.

For machine syntactic and semantic analysis is almost the same, however unreadable instruction is and since it does not contain any syntax, the machine will execute what has been requested.

Martin Fowler has a interesting phrase about that.

Any fool can write a code that the computer can understand. Good programmers write code that humans can read.

6

The errors of syntax.

The compiler does not understand, for example, multiply a string with an integer number in C. The compiler will detect them because it cannot compile them.

Semantic errors.

The compiler understands, but does not do what you, the programmer, wish to be done. They may be using the wrong variable, the wrong operation or operations in the wrong order. There is no way for the compiler to detect this error, as it was the programmer who created the wrong logic, but the compiler compiled without errors.

  • 1

    This answer is incorrect. Multiplying a string by an integer in C is a semantic error. What you call semantic errors are actually errors of logic.

3

  • 3

    This answer is incorrect. What is here called semantic error is actually logic error. What is here called syntactic error are semantic errors. Syntactic error would put a Se without the então, or put a parenthesis that opens and closes nowhere.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.