How do you know what language a program was written in?

Asked

Viewed 12,885 times

7

I downloaded a program on Windows, and in it I can not get information about which language was programmed. There is how to know?

  • Have any answers solved what was in doubt? Do you need something else to be improved? Do you think it is possible to accept it now?

3 answers

19

In general, easy and guaranteed has no way of knowing.

You can look at the binary in an editor that allows this (probably a hex editor) and search for binary signatures that deliver the information. It is common for the compiler to put something indicating what was used to generate that binary.

If not, you can search for common functions used in the language library. But this can give false positives, especially if it is C functions. You would have to use other criteria to confirm.

There are off-standard binaries, so this may help or hinder identification. Of course, so far it is not difficult to know that a code is Java or C# because it is out of the standard and does not even need to look at the binary.

If none of this works there you have to look for other signals, like how to build the code to try to see if it matches what a particular language does. Much more complicated and easy to miss. Obviously you have to know how specific compilers generate binaries.

There are utilities that help with this task, but I’ve never used them for this:

Some promise miracles that are unrealistic. You can only guarantee in the most obvious cases where you even looking at the binary can see what it is. The ones I mentioned are just tools to help.

  • 1

    If the person who has denied me tells me what is wrong with my answer I can improve it.

8

Corroborating the response of Maniero, in Windows environment there is no guaranteed way to know which language the executable was written in.

What is an executable.

An executable program or executable file, also known as binary image, in computer science is a file in which its contents must be interpreted as a program by an operating system.

Typically, they have the binary representation of the machine instructions of a specific processor, but may also contain an intermediate form (IL or JAVA Bytecode) that may require services from an interpreter (JIT) to perform.

On most modern architectures, an executable file contains a lot of information that is not part of the program itself, such as information about the environment needed for running the program, symbolic information and debugging, or other information used by the operating system in preparing the programme to be implemented.

Executables have calls to operating system services in addition to common machine instructions. This means that executables are usually specific to an operating system in addition to being specific to a processor.

Windows Portable Executable(PE)

The term Portable (portable) refers to portability over all Windows operating systems (in both 32-bit and 64-bit versions). The PE format is basically a data structure that encapsulates the information needed for the Windows system loader to handle executable code (machine language). This includes dynamic reference library for linking, exporting and importing API, resource administration data, Storage thread-local data (TLS).

In Windows NT operating systems, PE is used for EXE, DLL, OBJ, SYS, and other file types. The Extensible Firmware Interface (EFI) specification says that PE is an executable format standard in EFI environments. PE is a modified version of the Unix COFF format. PE/COFF is an alternative term.

Why there is no guaranteed way to know which language the executable was written in?

On the Microsoft page Windows Portable Executable, where the standard Windows executable portable executable is defined, there is no field within the executable file that is intended for the signature of the language or the compiler that produced the file.

In fact there is only one field where this information can be passed that is the field attribute certificate. Even if this information can be passed it may not be present as it is not mandatory, it varies from certificate to certificate.

API to read certificates from a Windows executable?

To read the certificates of an executable you must include the header imagehlp.h, in its program written in C/C++, where the structure is defined WIN_CERTIFICATE and the functions ImageGetCertificateHeader and ImageGetCertificateData, where:

WIN_CERTIFICATE is the data structure that will contain the information obtained from the extraction of the certificate. And has the following signature:

typedef struct _WIN_CERTIFICATE {
    DWORD       dwLength;
    WORD        wRevision;
    WORD        wCertificateType;   // WIN_CERT_TYPE_xxx
    BYTE        bCertificate[ANYSIZE_ARRAY];
} WIN_CERTIFICATE, *LPWIN_CERTIFICATE;

The function ImageGetCertificateHeader gets the header of a specific certificate. It has the following signature:

BOOL IMAGEAPI ImageGetCertificateHeader(
  HANDLE            FileHandle,
  DWORD             CertificateIndex,
  LPWIN_CERTIFICATE Certificateheader
);

Where:

FileHandle

It is the Handle for the binary image. This Handler should be open access type FILE_READ_DATA.

CertificateIndex

The certificate index whose header will be returned.

Certificateheader

Pointer to structure WIN_CERTIFICATE which will receive the certificate header.

Already the function ImageGetCertificateData extracts the complete certification of the file. It has the following signature:

BOOL IMAGEAPI ImageGetCertificateData(
  HANDLE            FileHandle,
  DWORD             CertificateIndex,
  LPWIN_CERTIFICATE Certificate,
  PDWORD            RequiredLength
);

Where:

FileHandle

It is the Handle for the binary image. This Handler should be open access type FILE_READ_DATA.

CertificateIndex

The certificate index whose header will be returned.

Certificate

Pointer to structure WIN_CERTIFICATE which will receive the certificate header. If the buffer is not large enough to contain the structure, the function will fail and the last error code will be set to ERROR_INSUFFICIENT_BUFFER.

RequiredLength

As input this parameter specifies the length of the Certificate buffer in bytes. If successful this value returns the certificate length.

Concluding remarks.

In addition to the software that Maniero has already cited in his reply I add the Exiftool. It is easy to use, is available on Windows/Linux, is part of the back end of Virustotal and is able to get information from various file types. Example of use in itself: Z: Downloads>"exiftool(-k). exe" "exiftool(-k). exe"

ExifTool Version Number         : 10.13
File Name                       : exiftool(-k).exe
Directory                       : .
File Size                       : 6.4 MB
File Modification Date/Time     : 2016:03:12 20:31:08+01:00
File Access Date/Time           : 2016:04:02 16:37:16+02:00
File Creation Date/Time         : 2016:04:02 16:37:16+02:00
File Permissions                : rw-rw-rw-
File Type                       : Win32 EXE
File Type Extension             : exe
MIME Type                       : application/octet-stream
Machine Type                    : Intel 386 or later, and compatibles
Time Stamp                      : 2006:06:02 12:45:17+02:00
PE Type                         : PE32
Linker Version                  : 6.0
Code Size                       : 12288
Initialized Data Size           : 917504
Uninitialized Data Size         : 0
Entry Point                     : 0x354c
OS Version                      : 4.0
Image Version                   : 0.0
Subsystem Version               : 4.0
Subsystem                       : Windows command line
File Version Number             : 10.1.3.0
Product Version Number          : 10.1.3.0
File Flags Mask                 : 0x003f
File Flags                      : Debug
File OS                         : Windows NT 32-bit
Object File Type                : Executable application
File Subtype                    : 0
Language Code                   : Process default
Character Set                   : Unicode
Comments                        : ExifTool EXE for Windows
Company Name                    : Phil Harvey
File Description                : Read and Write meta information
File Version                    : 10.1.3.0
Internal Name                   : ExifTool
Legal Copyright                 : Copyright (c) 2003-2016, Phil Harvey
Legal Trademarks                :
Original File Name              : exiftool(-k).exe
Private Build                   :
Product Name                    : ExifTool
Product Version                 : 10.1.3.0
Special Build                   :
Build Date                      : 2016:03:12 14:27:51
Bundled Perl Version            : ActivePerl 5.8.7
Home Page                       : http://owl.phy.queensu.ca/~phil/exiftool/

Note on the penultimate line the information...

Bundled Perl Version            : ActivePerl 5.8.7

... which indicates the language in which the executable was generated. However, as already explained, not all executables will have this information, especially since it is added specifically by Activeperl. Suggesting then that each executable should be analyzed case by case, thus there is no systematized way to obtain the language in which the executable was generated.

3

I researched a little on the subject, I realized that times ago and until the present day it uses reverse engineering to find out in which language the program/ application was created, but depending on the security and encryption of the program is very difficult.

Therefore nowadays it is a little more "practical" to know this, with some specific programs, I found in an article one of the programs used, follows below:

[1]https://www.arquivoti.net/como-saber-em-que-linguagem-um-programa-foi-escrito/

Browser other questions tagged

You are not signed in. Login or sign up in order to post.