Problems decoding Assembly x86 from binary

Asked

Viewed 560 times

13

I have written a program whose goal is to read a binary executable file compiled for the x86 (intel) architecture and interpret the Assembly code contained therein by executing instruction by instruction. The part of reading the executable, extracting the sections and creating a virtual memory that includes the executable code works smoothly and I was able to run some very simple programs (example: int main() {return 0;}).

To decode the instructions I’m basing myself on intel manual (in English). Additionally I am using the utility objdump -d to display the executable Disassembly to compare with my results.

My problem is in decoding the following sequence of bytes: (hexadecimal)

67 89 04 18

The objdump correctly states that this means:

mov    %eax, (%eax, %ebx, 1)

My problem is when I do the process manually based on the manual:

  1. 67: Address size change prefix;
  2. 89: Instruction option mov from a record to a memory/record;
  3. 04: Modr/M byte to indicate that the first argument is %eax, the need for a SIB and that the Displacement is zero;
  4. 18: Byte of SIB indicating that the last argument is %eax+%ebx.

The detail is that both Modr/M and SIB are considered in 32-bits. It means that at this stage the size of the operand and the size of the address are 32-bits. However, the prefix for changing the address size needed to be used, which means that the original instruction (without the prefix) is 32-bits in the operand and 16-bits in the address. That is correct?

How is it possible to have a 32-bit operand and 16-bit address instruction? I tried to compile code with an instruction like this using gas (GNU Assembler) and it returns an error stating that that combination is impossible. Why then is the default?

  • A simple question: What kind of C program when compiled generates the sequence 67 89 04 18? When you run this program, what this instruction does when it has addresses and registers whose values do not fit in 16 bits?

  • One can generate quite a similar instruction: int main() { volatile int a[] = {1, 2, 3, 4, 5}; volatile int i = 2; a[i] = i; }. Produce that by compiling with gcc -O3: mov %edx, 8(%esp, %eax, 4). Quite similar to the example I used.

  • Interestingly this is coded as 89 54 84 08. No prefix used. Now I’m confused... Why one case requires the prefix and the other not?

1 answer

7


The intel handbook, in section 2.1.1 states that the 67H prefix allows programs to change the addressing between 16 and 32 bits. That any size can be the default, and that the prefix selects the non-standard:

The address-size override prefix (67H) Allows Programs to switch between 16- and 32-bit Addressing. Either size can be the default; the prefix selects the non-default size.

Prefix changes address, look at tables 2-1 and 2-2 of section 2.1.5.
For example, an instruction with no prefix could be MOV [EBX], ESP and adding the prefix would become MOV [BP+DI], ESP.

  • Yes, that I wrote in the question. The question is: why mov %eax, (%eax, %ebx, 1) needs the prefix and mov %edx, 8(%esp, %eax, 4) No? What’s the difference?

  • After editing. In fact, this is the behavior that I observed in most cases and that I understand to be right. But why MOV [EAX+EBX], EAX has the prefix?

  • As far as I know, you can’t use SIB with 16-bit addressing. This means that the program is running with 16 bits of addressing by default, and the prefix serves to enter 32-bit mode.

  • I did some more tests and could not reproduce my problem. I probably had done something wrong at the time of interpreting the data. The prefix 67 does not appear for the example of the question now. Anyway, I will accept your answer, thank you.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.