Problems decoding Assembly x86 from binary

Question

Problems decoding Assembly x86 from binary

Asked 11 years, 7 months ago

Viewed 560 times

13

I have written a program whose goal is to read a binary executable file compiled for the x86 (intel) architecture and interpret the Assembly code contained therein by executing instruction by instruction. The part of reading the executable, extracting the sections and creating a virtual memory that includes the executable code works smoothly and I was able to run some very simple programs (example: int main() {return 0;}).

To decode the instructions I’m basing myself on intel manual (in English). Additionally I am using the utility objdump -d to display the executable Disassembly to compare with my results.

My problem is in decoding the following sequence of bytes: (hexadecimal)

67 89 04 18

The objdump correctly states that this means:

mov    %eax, (%eax, %ebx, 1)

My problem is when I do the process manually based on the manual:

67: Address size change prefix;
89: Instruction option mov from a record to a memory/record;
04: Modr/M byte to indicate that the first argument is %eax, the need for a SIB and that the Displacement is zero;
18: Byte of SIB indicating that the last argument is %eax+%ebx.

The detail is that both Modr/M and SIB are considered in 32-bits. It means that at this stage the size of the operand and the size of the address are 32-bits. However, the prefix for changing the address size needed to be used, which means that the original instruction (without the prefix) is 32-bits in the operand and 16-bits in the address. That is correct?

How is it possible to have a 32-bit operand and 16-bit address instruction? I tried to compile code with an instruction like this using gas (GNU Assembler) and it returns an error stating that that combination is impossible. Why then is the default?

A simple question: What kind of C program when compiled generates the sequence 67 89 04 18? When you run this program, what this instruction does when it has addresses and registers whose values do not fit in 16 bits?

– Victor Stafusa

2013/12/16 at 02:16
One can generate quite a similar instruction: int main() { volatile int a[] = {1, 2, 3, 4, 5}; volatile int i = 2; a[i] = i; }. Produce that by compiling with gcc -O3: mov %edx, 8(%esp, %eax, 4). Quite similar to the example I used.

– Guilherme Bernal

2013/12/16 at 02:23
Interestingly this is coded as 89 54 84 08. No prefix used. Now I’m confused... Why one case requires the prefix and the other not?

– Guilherme Bernal

2013/12/16 at 02:26

1 answer

Browser other questions tagged assembly x86 intel

You are not signed in. Login or sign up in order to post.

by Marcos Zolnowski • **2,687** points · Answer 1 · 2013-12-16T03:33:46+00:00

The intel handbook, in section 2.1.1 states that the 67H prefix allows programs to change the addressing between 16 and 32 bits. That any size can be the default, and that the prefix selects the non-standard:

The address-size override prefix (67H) Allows Programs to switch between 16- and 32-bit Addressing. Either size can be the default; the prefix selects the non-default size.

Prefix changes address, look at tables 2-1 and 2-2 of section 2.1.5.
For example, an instruction with no prefix could be MOV [EBX], ESP and adding the prefix would become MOV [BP+DI], ESP.