4
I’m learning about processor architecture and I intend to assemble an Assembler. What is the procedure and steps for writing a program that transforms written code into machine code?
The assembler can be for ARM or 8086 and can be in C.
4
I’m learning about processor architecture and I intend to assemble an Assembler. What is the procedure and steps for writing a program that transforms written code into machine code?
The assembler can be for ARM or 8086 and can be in C.
6
What you are asking for is to make a simple compiler, which receives a source code and produces a machine code.
The compilation process is normally divided into the following steps:
And also often a few more things occur in typical compilers:
However, since your project is not a major commercial product and the compilation process should only be a 1-to-1 Assembler instruction translation for machine code instruction, then the structure of your compiler will be much leaner.
First, choose a small set of instructions that your compiler will accept. Start small and then grow. In your code create a struct
or something similar to represent an instruction. Basically this structure will have a field that is equivalent to the type of the instruction and other fields that represent the parameters/arguments/operands of this instruction. You can give a little push and fit Abels and directives in this struct
also.
The ideal would be for you to create a complete lexical parser and a complete parser. Normally you would use a regular lexical grammar and a context-free syntactic grammar. But, unless you have some tool ready and master this kind of knowledge, doing it would be a very costly job. Then I propose a simpler approach:
\n
.struct
to store what was read on the line.Having the list of program instructions, each one within its due struct
, check that all Abels and referenced routines exist. Check that the arguments, registers and operands used in each instruction are valid, compatible with each other and with the instruction, and that they are in a certain/valid quantity and in the right/valid order, as expected by the corresponding instruction. Check everything pertinent. If you find something wrong, issue an error and stop.
To do this, you will probably need to create a role in your specialized C code to analyze each distinct type of instruction. Something like verificar_MOV()
, verificar_POP()
, verificar_ADD()
, etc..
First, you will have to calculate what is the size of each instruction. As you have already done semantic analysis, then, unless you have made a mistake, so far all the instructions are valid and well-formed. With this, you will have to figure out what are the values of all offsets necessary to calculate all necessary addresses. You should have some table or manual that explains how to convert each instruction into your corresponding machine code, and you will do so at this stage by storing all the resulting machine codes in a list with a 1-to-1 ratio with the instructions you have stored on struct
. If you want, you can store the machine codes inside the struct
.
Again, you will probably need a specialized code generation function for each type, such as gerar_codigo_MOV()
, gerar_codigo_PUSH()
, gerar_codigo_ADD()
, etc..
Once this is done, all you have to do is write the machine codes sequentially within the executable file. You may have to add things like headers to these files as well.
At this point you should have your simple compiler working. There is no code optimization. There is no intermediate code generation (which is useful for generating code for different architectures). Error handling and recovery is minimal. Since your program is monolithic so far, there is no dependency management and/or linkage. All these features can be added later incrementally one at a time, if you want.
In addition, you should be using a minimum set of instructions, and for semantic analysis and code generation, your compiler will probably have a good piece of code specific to each instruction. Once you have a minimal compiler implemented, if you have done it in a modular way, adding support for new types of instruction, one-on-one should not be too difficult.
Finally, since its language is very low level, it probably makes no sense to talk about checking and inference of types and the allocation of registers is something that becomes a problem of the user and not of the compiler, then you probably won’t need to worry about these aspects.
Browser other questions tagged c assembly arm x86
You are not signed in. Login or sign up in order to post.
Welcome to Stackoverflow Anderson. Make a tour to better understand the model of our community. What is your question on the subject? See how to ask a good question.
– Renan Gomes
Is it writing Assembly grammar? Is it writing parser? Is it sending binary code? Is it linking between more than one file? Is saving the result in an executable format? What is your question?
– Guilherme Bernal
the main doubt is to write the parser
– Anderson Miranda