Implementation of a Lexical Analyzer

Asked

Viewed 718 times

1

Hello, guys! I’m implementing a lexical parser for a grammar I created (using Gals). Grammar validation was performed using the concept of finite automata, according to the image:

inserir a descrição da imagem aqui

At the moment I have the following codes:

Py lexical file.:

with open("transicao.txt", "r") as programa:
    arquivo = programa.readlines()
    lista = []
    for linha in arquivo:
        line = linha.strip('\n').replace(' ', '')
        for i in line:
            lista.append(i)

    with open("tokens/transicao.txt", "r") as arquivo:
        linhas = arquivo.readlines()

    def concatenar(list):
        return ''.join(list)

    def ftransicao(s, l):
        for v in linhas:  # Navega pelas linhas do arquivo de transição
            linha = v.split(',')  # Divide a string em 3 partes (estado atual, letra, próximo estado)
            if linha[0] == s:  # Se o primeiro valor da string dividida for igual o estado atual
                for z in linha[1]:  # É navegado apenas pelos valores das letras da função transição
                    if z == l:  # Se a letra lida no arquivo for igual a letra lida da palavra então é retornado o próximo estado
                        # print('{} contém {} então vai para {}'.format(s, l, linha[2]))
                        return linha[2].strip()
        return None


    atual = 'q0'
    efinal = ['q3', 'q7', 'q28', 'q9', 'q15', 'q18', 'q25', 'q33', 'q34', 'q35',
              'q36', 'q37', 'q39', 'q41', 'q42', 'q43', 'q44', 'q45', 'q46', 'q47', 'q51', 'q53']
    lexema = []

    for i in range(len(lista)):

        atual = ftransicao(atual, lista[i].strip())
        lexema += lista[i]

        if atual in efinal:
            print('Caractere identificado [ {} ]'.format(concatenar(lexema)))
            atual = 'q0'
            lexema.clear()
            continue
        elif atual is None:
            atual = 'q0'
            print('Caractere não identificado [ {} ]'.format(lista[i]))
            lexema.clear()
            continue

Transition.txt file that contains an example of grammar:

run
integer f = 22;
integer g = 3;
integer n;
display();
input(n);
while(g <= numero){
f = f * i;
g = g + 1;
}
display(f);
exit

And the transition function described according to the image:

q0,r,q1
q1,u,q2
q2,n,q3

q0,e,q4
q4,x,q5
q5,i,q6
q6,t,q7

q4,l,q26
q26,s,q27
q27,e,q28

q0,i,q8
q8,f,q9
q8,n,q10
q10,t,q11
q11,e,q12
q12,g,q13
q13,e,q14
q14,r,q15

q10,p,q16
q16,u,q17
q17,t,q18

q0,d,q19
q19,i,q20
q20,s,q21
q21,p,q22
q22,l,q23
q23,a,q24
q24,y,q25

q0,w,q29
q29,h,q30
q30,i,q31
q31,l,q32
q32,e,q33

q0,-,q34
q0,+,q35
q0,*,q36
q0,/,q37

q0,<,q38
q38,=,q39

q0,>,q40
q40,=,q41

q0,=,q42
q42,=,q43

q0,(,q44
q0,),q45
q0,{,q46
q0,},q47
q0,;,q53

The problem is as follows. How can I validate the string, digits, identifiers as shown in the image? The others are working perfectly and are being recognized in the listing.

Thank you to anyone who can help. Good evening!

1 answer

1


Your language is too complex to define in a regular grammar. You should use a context-free grammar, this way you can even define the origin of the mathematical operators and check balanced parentheses.

After you create your GLC, create a parser LL to validate the syntax and generate the AST.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.