Logic of these bit-by-bit operations

Asked

Viewed 112 times

4

It’s been a long time since I wanted to start in the world of emulators, I decided to stop trying to make an emulator of a complex system and start with a very basic one, a CHIP-8 emulator, which is what many indicate in emulation forums. Well we go in pieces:

First operation I don’t see the point:

std::uint16_t opcode = m_Memory[reg.PC] << 8 | m_Memory[reg.PC + 1];

Basically 1 CHIP-8 option is worth 2 bytes, but the rom is 8 bits, first I access the Std array::uint8_t which I call m_Memory which I used to store the ROM and font set at the Program Counter position which is started as 0x200 which is where most CHIP-8 programs/games start, then 8 more zeros are added, which is easy to understand, 1 byte = 8 bits, so 2 bytes is 16 bits, but then the confusion starts, if you already got the opcode then why mask a 16 bits value with 8? and why use the rom itself but advancing the position of the pc?

Here we go to the second part of my problem:

switch (opcode & 0xF000) {
   ...
}

In a discussion I started on a Reddit forum about emulators people told me that they mask the opcode with 0xF000 to get the real opcode, but what I didn’t understand is how they came to the conclusion that they should mask and why with this value.

The final part:

Utilise that documentation where I and many others are guided, first we go to the opcode 0x6000 or 6xkk or LD V x , byte:

//LD Vx, byte
case 0x6000:
    reg.Vx[(opcode & 0x0F00) >> 8] = (opcode & 0x00FF);
    reg.PC += 2;
    std::cout << "OPCODE LD Vx, byte executado." << std::endl;
    break; 

The CHIP-8 has 16 8-bit registers I called Vx, let’s go:

reg. Vx[(opcode & 0x0F00) >> 8]

First I converted the opcode 0x6000 into binary and performed the operation and:

0110 0000 0000 0000    //0x6000
0000 1111 0000 0000    //0x0F00
-------------------
0000 0000 0000 0000    //0x0

Afterward >> 8 moves 8 bits right which would 0000 0000 that is, the Vx index 0, after = (opcode & 0x00FF) which remains:

0110 0000 0000 0000    //0x6000
0000 0000 1111 1111    //0x00FF
-------------------
0000 0000 0000 0000    //0x0

So why not just do reg.Vx[0] = 0; ?

Remembering that I’ve never had to do Bit to Bit operations before on any project, I just know what the books told me about Operation AND, OR, XOR, NOT etc...

I wish I could understand this logic that people use to be able to use in future projects.

  • 2

    Some related things: https://answall.com/q/175345/101, https://answall.com/q/201392/101, https://answall.com/q/178733/101, https://answall.com/q/205163/101, https://answall.com/q/213615/101, https://answall.com/q/268467/101 e https://answall.com/q/9497/101

1 answer

2


Some of the things you’re not getting seem to be because you didn’t understand that there are values that are an "family" of opcodes, or parameters for the same opcode - all encoded in the 16-bit value - and not just a fixed value. The last example, of the 0x6000 opcode, for example, you did the whole simulation as if the value at all times was to be exactly 0x6000 - however, see the documentation:

6xkk - LD Vx, byte Set Vx = kk.

The interpreter puts the value kk into Register Vx.

That is, the first "Nibble" (first 4 bits) of the opcode contains the hexa digit "6". The remaining 3 hexadecimal digits are the opcode arguments. So yes, "0x6000" will always be "setar V0 = 0x00", but the opcode 0x62FF means "setar V2 = 0xFF". The role of your interpreter/emulator is precisely detect that opcode 6 means placing a value on a registrar, extracting those values, and executing the operation.

See how this already answers your second question- when switching-case with the masked opcode with 0xF000, only the value "0x6000" is for _comparison as case, but within the case code, you need the full opcode - is in the other digits of the opcode are the parameters.

opcode = 0x62ff;
switch (opcode  & 0xf000):
   ...
   case 0x6000:
       register_number = (opcode & 0x0f00) >> 8;
       value = opcode & 0xff;
       registers[register_number] = value;
       break;
   ...

Note in the documentation that not all opcodes are fully determined by the first hexadecimal digit - for some of them, for example, the "0x0" itself, there is a whole sub-family of opcodes - in such cases you will make another switch/case within the first (or, call a function in C for this), to test the other values.

And finally, as for:

opcode = m_Memory[reg.PC] << 8 | m_Memory[reg.PC + 1]; 

It is readable as clearly as in English - The m_Memory vector (*) contains 8-bit values. You need to read two bytes and compose a single 16-bit value (and, see the documentation: the most significant byte comes first - i.e., "big endian" )

All Instructions are 2 bytes long and are stored Most-significant-byte first.

So - you take the first byte, multiply it by 2 8 using shift 8 << 8- that is, insert 8 zeros to the right of that byte - and then arrow those 8 lower binary digits with the value of the next byte in memory, using the or binary (since all the corresponding values are 0, the value of the second byte is placed in integrals at the lower bits of the opcode). In other words: you read a byte, put it in the 15 to 8 bit position of your opcode, and read the byte in the next memory position at the 7 to 0 bit position.

inserir a descrição da imagem aqui

(*) Separate note: you really get very little by complicating variable names - even if this is style practice in other examples you’re reading: "m_Memory" instead of "memory" just means 4 more keyboard touches, and three "visual junk" signs that your brain has to rule out when reading the variable. There’s not much risk of you having another "memory" variable in that code, there’s?

  • As I said I don’t have much experience with bitwise, and I had never heard of Endianess until a few weeks ago. But thank you, because in the operation or the program counter is always added with 1? studying some repositories I always notice that at the end of each instruction makes an addition with 2 or when it is to skip an instruction +4.

  • 1

    In this case the sum is always "1" just to catch the next byte, which composes the same instruction of 2 bytes. In "PC + 2" is already the next instruction.

  • Thank you very much, now I can finally find this logic.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.