How does antivirus scan my program?

Asked

Viewed 539 times

29

I had a class in college that left me "kind of" puzzled, my teacher was talking about the differences of interpreted languages and compiled languages and pointed out that interpreted languages could have their code stolen, when in compiled it does not happen. There opened a series of doubts, where the main one is:

If my code is compiled and you can’t tell how it was written, how do the antivirus know it can be dangerous?

  • 10

    This explanation about the "theft" of the compiled and interpreted, as much as I understand the "motivation" for him to say this, is very crude. It would fit well in a bar conversation, explaining to the grandmother (if she goes to the bar), but in a college it worries me a little. About Antivirus, the code is irrelevant. Detection is usually made by other features, such as fingerprinting, i.e., passages known from the executable in a table of definitions, and behaviors during execution.

  • I’m not sure, but in my guess the anti-virus does not look at the code of the program to know if it is ill-advised or not, it should use another parameter, but I can not say what is this parameter.

  • 5

    @Leandrolima see Bacco’s comment.

  • 5

    Your question has already been very well answered by Renan, I am sharing a link about the difference between languages interpreted and compiled to complement your study: https://answall.com/questions/77070/difficult-compiledlanguage

2 answers

34


(...)my teacher was talking about the differences of interpreted languages and compiled languages and pointed out that interpreted languages could have their code stolen, when in compiled it does not happen(...)

I’m going to give the benefit of the doubt to your teacher, and I’m going to assume that statement got here that way because of the wireless phone.

If you have a program on your computer, you have the source code. No exceptions.

The build process generates executables or libraries (for example, files .dll in Windows), which are files said "in machine language" rather than human-readable text. In fact, if you try to open these files, you will see that they are unreadable and do not match the source files. However, take this information to life: there is no compiled source code that cannot be decompiled.

Want an example? Use C# to generate an executable or file .dll. Then open the file with the Ilspy.

There are some people who believe that you can make the code more "protected" if you use a technique called obfuscation, which "scrambles" the decompiled code generated by tools like the one I mentioned above. But even obfuscation doesn’t protect anyone from "theft", since a really motivated and dedicated programmer can reassemble the original code anyway.

The only way to ensure that a source code will ever be read is not to hand it over to anyone. Leave the code on a server and ensure access to your system over the internet. Only those who have access to the server’s hard drive will have access to its source code. It’s not 100% safe, but it’s as close as you can get.

Relevant issue: someone commented on that in this reply:

But if any program can be "recompiled" because we do not have windows source code for example?

Look child, we have yes. It’s actually quite hilarious. My favorite is the Windows 2000, which has several pearls written in the comments in the code. Good reading :) (crossed out because this code was leaked, not obtained via reverse engineering, and comments are not included in compiled code).

For example, and again talking about Ilspy: many things in Windows use . NET, which is currently embedded with the system. You can open Ilspy and use the option File -> Open from Gac to see the source of the platform’s key libraries.

For other system libraries, you can try a C/C++ decompiler such as Snowman. But try to open only small DLL’s, otherwise the system hangs (to open large DLL’s, you need a plugin). Tip: On Windows 8, you can try to decompile this:

c: windows system32 Alttab.dll


About antivirus, they don’t care about their source code - they see the actions your program performs, regardless of how it was written. Every program interacts with the operating system through requests, requests... I.e.: Windows, tell me what time it is; Linux, send the byte 00101000 to serial port 2; Solaris, write it to this memory address etc.

The antivirus specifically looks for programs that do maracutaias of the type:

  • try to read browser program status;
  • impersonate a user to perform operations that require action by a human being (such as pressing the OK buttons on Windows permission requests);
  • force actions for which there is no permission;
  • send data to known malicious web addresses;

Etc., etc., etc....

This involves pattern identification and currently involves somewhat artificial intelligence.


One sad fact is that from time to time I see someone asking here at Sopt how to do something that will clearly be seen by antivirus as an action of malware. For example: Simulate "ok" via command line. Often people don’t think about the consequences that certain actions would bring to people’s security and privacy if possible.

See also this about how Windows recognizes an application as safe: Installer recognized as a virus.

  • 16

    Next class: "Hardware: What You Kick - Software: What You Swear". P

  • 6

    Sensational response. @Bacco Must be the same faculty you passed this slide

  • 1

    Response show +1

  • 2

    Renan: much better your answer after Edit, +1... @LINQ bonus points pro "it should only be used in the latter case" :D

  • 1

    Really any system that was compiled can be "decompiled", I do not believe this statement, because if not windows, office of life would already have its sources around.

  • 1

    @Wictorchaves and how do you think the "computer" understands the code?

  • The computer understands the programs because they have been compiled into machine language.

  • Complicated, but he sat his dick in the interpreting languages, and praised the compiled too much, so I thought there was something wrong.

  • 4

    @Wictorchaves precisely, and just as you convert to machine language, you can convert back. It may lose formatting, original variable name, but the "essence" is the same. If I see a C3 80 80 in an MSX compiled program I know it’s a jump &h8080, just know the processor (usually the person doesn’t do it manually, uses software to "decompile").

  • @Raizant For sure has anything wrong, but not with languages.

  • 1

    But if any program can be "recompiled" because we do not have windows source code for example?

  • 8

    @Wictorchaves simply by the fact that it doesn’t pay for the work, and depending on where you are, reverse engineering is a crime. Also, it almost always pays off for you to redo a code than trying to understand what went on in the original programmers' minds of anything. This reminds me of many questions from the site like "how do I protect my code", I would like to say "No need, no one will want your code", but it would sound rude (that goes for mine too, of course).

  • 7

    It’s very nice to learn from certain excerpts, and there are some great things to see in someone else’s code, but in general, taking advantage of an entire ready is probably not the best way, except for the theft of intellectual property. The code itself is usually not the "soul" of the software. This "soul" is in the head of the original programmers, who know what motivated the real choice of the way they solved each part of the original problem/mission.

  • I understood, thank you, Renan could complement the asking making it clear that it is not so simple the process of "recompilate", because by the text it seems that it is only put in a program, that it will translate with the original code.

  • 1

    @Wictorchaves but that’s right. It’s simpler than planting potatoes.

  • 1

    I do not agree, I do not believe that it is that simple, nor to compile it is simple, let alone to "decompile"

  • @LINQ I typed on my mobile’s virtual keyboard and it didn’t work, damn it I need it.

  • 1

    @It’s just that maybe he realized it wasn’t used ultimately =D

  • I tested the program you put in the link, I did a test with Notepad.exe and it didn’t work.

  • 1

    @Wictorchaves don’t just get into lack of faith so do the experiment ;) Grab any DLL from a program written in C# and drag to Ilspy window.

  • 1

    @Wictorchaves Notepad++ is not a .NET. You need another decompiler for it.

  • But I tested it, it didn’t work, it says it can’t load Assembly, or it’s not that simple.

  • 8

    @When someone speaks good or bad about something and does not justify and justify, run. Interpreted languages have their drawbacks, but they are not at all bad.

  • I agree that there may be reverse engineering in compiled codes, but it is not simple as you think, there is case and case, as just happened here, I can’t just pick up any exe and play in a program and see the code.

  • 4

    @Wictorchaves the tool I said decompiles only code made for the .NET. There are other tools for Java, C++ etc. If you still doubt, open Ilspy and go to File->Open From GAC. It will list some libraries that come bundled with Windows, so you can open any one and see the code. And you can write your own Hello World in C# and compile it into an executable. What you can’t do is say "I did a very specific test and it didn’t work, so it’s impossible" ;)

  • 1

    I will do more tests, research on the subject, I am interested in learning more, I do not have as much experience on this topic as you, but still do not believe in this "simplicity", I believe there are cases that is actually opened the file in a software, but there are other variables that make this topic more complex, I can think this way for lack of experience, but that’s it, thank you for all the information :)

  • 1

    @Wictorchaves do you really find it difficult to compile? Since I started programming I have always compiled by hand, without help from Ides using GCC and Mingw. Even Java (which does not compile for machine language but for an intermediate language that works with JIT) I compile by terminal. Nothing against your opinion, I was just intrigued by why the difficulty.

  • @Guilhermenascimento then, I’ve used gcc a lot, today I only use ide to compile, but I’ve studied the compilation process, although it’s simple to use gcc, behind it does a lot, I didn’t move much (not to say anything) with "decompile" code, so much so that I do not even know the correct term to describe it, but as I have it for a long time that is not a simple process "decompile", from here comes my inquiries and curiosity on the subject.

  • 1

    @Wictorchaves yes, but I’m not talking about the decompile, you said that "compiling is difficult" (I think it’s already deleted), so I was wondering what you meant by this, would it be difficult for the programmer or the machine? 'Cause if you’re a programmer, it just seems inexperienced. ;)

  • @Guilhermenascimento compiling is easy, but behind the compiler does a lot, making the process behind it complicated, so I think the reverse engineering process isn’t so simple, as the compiler does a lot of things.

  • 1

    @Wictorchaves is what I wanted to know, so the hard part is what the "machine" fez.

  • 1

    @Wictorchaves added link to a program that decompiles Windows fonts, with a suggestion of a DLL for you to test ;)

Show 27 more comments

22

I decided to answer because it seems to still have doubts regarding Renan’s response.

Make it clear that antivirus do not need and do not care about the source code of the application.

There are mainly two strategies to detect a virus.

  • one of them is to look for a signature in the same executable code. It checks to see if it has a certain sequence of bytes that is previously known to be a virus.
  • Another is to analyze whether there are calls to certain Apis or code patterns in a certain way that can be used to cause problems. This is why there are false positives in certain applications.

Surely there are other strategies, you can even check something while running or you can intercept certain calls from Apis.

What Renan said is that everything an app does is available for consultation. All instructions that a processor will execute and everything that the application will invoke in the application is encoded in a binary. All those bytes have a meaning that can be understood by who knows (the processor for example), it is not something random or encrypted. It’s just a little more complicated for a human to understand.

If my code is compiled and you can’t tell how it was written, how do the antivirus know it can be dangerous?

You can tell how he is written (in binary form), it is not possible to know the exact source code that originated this binary.

Decompile

If you have a program on your computer you don’t have the source code, but you can get something close to the source code that generated that binary. You won’t get something like this, you’ll lack comments, local symbol names, and maybe even modified public symbols, and the exact flow won’t be the same, just create the same result.

Decompile is especially possible in languages that use bytecodes and metadata. But when the code is overshadowed it becomes much more difficult to get usable results.

But it is not exactly an easy process and is far from generating good results in most cases.

The goal of antivirus is not to get the source code, it is only to understand what the binary does.

Source protection

That idea of interpreted and compiled language is already wrong.

It is possible to steal code from any application.

This idea of stealing source code is spoken by naive and laity. Good code is too complex for naive to understand and for experts to be interested in stealing it. Crude codes would only be of interest to very weak people. Hint, the vast majority of written codes are quite crude and serve no reference to anyone. In general authors of rough codes want to protect themselves.

As curiosity the Windows code was leaked and not obtained by reverse engineering, so it has even the comments.

  • Actually, that part of Windows code having the comments got weird because I swore they were removed after the program was compiled (I read this somewhere).

  • 1

    +1 :) I used some information here to improve my answer as well.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.