Parsing NLP with external list

Asked

Viewed 170 times

0

Parsing: An input text that will go through grammar and output are all the inputs that grammar finds in the text. The problem is that my non-terminals are files from external lists and I can’t see a way to do.

Example of a pseudo-code:

1) Open a text

2) Change to grammar (just one example):

Grammar("""

S -> NP VP

NP -> DET N

VP -> V N

DET -> list_det.txt

N -> lista_n.txt

V -> txt list.""")

3) Print the results of the text that obey the grammar

For example:

with open ("corpus_risque.txt", "r") as f:
    texte = f.read()

    grammar = nltk.parse_cfg("""
    S-> NP VP
    NP -> DET N
    VP -> V N 
    DET -> lista_det.txt
    N -> lista_n.txt
    V -> lista.txt""")

    parser = nltk.ChartParser(grammar)
    parsed = parser.parse(texte)
    print(texte)

Usually, grammars are presented this way, already ready:

grammar = nltk.parse_cfg("""

S -> NP VP
VP -> VBZ NP PP
PP -> IN NP
NP -> NNP | DT JJ NN NN | NN
NNP -> 'Python'
VBZ -> 'is'
DT -> 'a'
JJ -> 'good'
NN -> 'programming' | 'language' | 'research'
IN -> 'for'
""")

Would it be possible?

  • 1

    Can you explain better what is "grammar"? what is "opening a text" and what is "my non-terminal"?

  • Thanks for your patience, I edited my question. I don’t really express myself very well... Open a text: input, a text that will be parsed. Pass grammar: devise rules for the program to find in the text (Ex.: NP -> DET N must find all DET N sequences in the text) Not terminals: DET -> lista_det.txt, N -> lista_n.txt, V -> lista.txt

  • 1

    Sorry, I still don’t understand. What’s the main problem? What can’t you do? The problem is reading the files lista_det.txt, lista_n.txt and lista.txt, that’s it?

  • A grammar is defined by terminals (N, DET, V...) and not terminals ('o', 'home', 'is'...). The terminals are my lists but it is not possible to place a list within a grammar (I need these lists because there are many entries and they will certainly increase, so difficult to put in a script). I need to read these files so I can apply the grammar rules to a text (Ex: NP -> DET N will find all DET N in the text. Sorry, don’t bother. The truth is I’m a beginner and what I need to do is more complex, I think. Thank you so much for trying to help!! :)

1 answer

1


In fact, what you want is impossible. What happens is that vc is creating a terminal node "DET -> lista_det.txt", in which the analysis will ask for this terminal list_det.txt specified by the non-terminal Det in the list. try to create a cfg or fcfg type file with the split elements and then call in a script, it will be easier.

For example.: I create a file called Tester.fcfg with some grammar rules and lexical items with some strokes and a script x.py

My script will have:

import nltk

from nltk import grammar, parse, FeatStruct

sent = input('Digite uma sentenca ou palavra: ')

cp = parse.load_parser('tester.fcfg', trace=2)

tokens = sent.split()

trees = cp.parse(tokens)

for tree in trees: print(tree)

tree.draw()

And in the file Tester.fcfg:

##Regras Gramaticas##

Sentence -> SD[AGR=?a] SV[AGR=?a]
Sentence -> SD[AGR=?a]
Sentence -> SV[AGR=?a]
Sentence -> Nome
Sentence -> Verbo
Sentence -> PP[AGR=?a]
Sentence -> Pro[AGR=?a] 
Sentence -> Pro[AGR=?a] SV[AGR=?a]
Sentence -> P[AGR=?a]
Sentence -> P[AGR=?a] N[AGR=?a] | P N
Sentence -> VBar
Sentence -> SD SV

SN[AGR=?a] -> SD[AGR=?a] | N[AGR=?a] | SD[AGR=?a] PP[AGR=?a] | N[AGR=?a]

SD[AGR=?a] -> Det[AGR=?a] N[AGR=?a] | Det[AGR=?a] | PP[AGR=?a] N[AGR=?a] | Det N

PP[AGR=?a] -> P[AGR=?a] SN[AGR=?a]

SV[AGR=?a] -> V[AGR=?a] SN[AGR=?a] | V[AGR=?a] PP[AGR=?a] SN[AGR=?a] | VBar

VBar -> Pro[AGR=?a] SV[AGR=?a] | Pro[AGR=?a] V[AGR=?a]

Nome -> N

Verbo -> V

##Tracos Lexicais##

Det[AGR=[NUM='sg', GND='f'],CAT =[Cat='Artigo']] -> 'a' | 'da' | 'na'

Det[AGR=[NUM='pl', GND='f'], CAT =[Cat='Artigo']] -> 'as' | 'nas'

Det[AGR=[NUM='sg', GND='m'], CAT =[Cat='Artigo']]-> 'o' | 'de' | 'no' | 'um'

Det[AGR=[NUM='pl', GND='m'], CAT =[Cat='Artigo']]-> 'os' | 'nos'

Pro[AGR=[NUM='sg', GND='m', PERS='3']]-> 'ele'

Pro[AGR=[NUM='sg', GND='m', PERS='1']]-> 'eu'

P[AGR=[NUM='sg', GND='m', PERS='3'], CAT =[Cat= 'Pronome', SubCat= Demonstrativo]] -> 'este' | 'aquele' | 'esse'

P[AGR=[NUM='pl', GND='m', PERS='3']] -> 'estes' | 'aqueles' | 'esses'

P[AGR=[NUM='sg', GND='f', PERS='3']] -> 'esta' | 'aquela' | 'essa'

P[AGR=[NUM='pl', GND='f', PERS='3']] -> 'estas' | 'aquelas' | 'essas'

N[AGR=[NUM='sg', GND='f'], CAT =[Cat='Substantivo', SubCAT='Comum']] -> 'biblioteca' | 'doutora' | 'leoa' | 'livraria' | 'professora' | 'lavadeira' | 'aluna' | 'madre' | 'menina' | 'mae' | 'mulher' | 'dentista' | 'juiza'

N[AGR=[NUM='pl', GND='f'], CAT =[Cat='Substantivo', SubCAT='Comum']]-> 'doutoras' |  'meninas' | 'mulheres' | 'juizas' | 'bola' | 'pata'

N[AGR=[NUM='sg', GND='m'],CAT =[Cat='Substantivo', SubCAT='Comum']] -> 'menino' | 'homem' | 'juiz' | 'doutor' | 'professor' | 'livro' | 'carro' | 'jogador'

N[AGR=[NUM='sg', GND='m'], SEMANTICA=[ ANI='animal']]-> 'pato' | 'cachorro' | 'gato'

N[AGR=[NUM='sg', GND='m'],CAT =['Substantivo Proprio'], SEMANTICA=[ ANI='humano']]-> 'Pedro' | 'Carlos' | 'Henrique'

N[AGR=[NUM='sg', GND='f'], CAT =['Substantivo Proprio'], SEMANTICA=[ ANI='humano']]-> 'Maria' | 'Veronica' | 'Lara' | 'Carla'

N[AGR=[NUM='pl', GND='m']] ->  'meninos' | 'homens' | 'livros' | 'carros'

N[AGR=[NUM='sg', GND='n']] ->  'estudante' | 'piloto' | 'presidente' | 'jornalista' | 'jogadora' | 'jornal'

N[AGR=[NUM='pl', GND='n']] -> 'estudantes' | 'pilotos' | 'presidentes' | 'jornalistas'

V[AGR=[NUM='sg'], CAT =['Verbo'], CP=['presente do indicativo']] -> 'comprar' | 'compra' | 'comprou' | 'pegar' | 'pegou' | 'ler' | 'leu' | 'ama' | 'amo' | 'amar' | 'jogar' | 'entrou' | 'amor'

V[AGR=[NUM='sg'], CAT =[Cat='Verbo', SubCat = ' Ligacao e adicao'], CP=['presente do indicativo']] -> 'e'  


"""

Note that what will be called by the script will be the lexical items and grammar rules specified in the same file. The question is, what language models (in this case are traits organized by AVM [Value-Attribute]) you are following and for what type of computational implementation you want...

I don’t know if that’s exactly it, but from what I’ve seen, you’re trying to create beyond a corpus, forms of labelling and Parsing. See the NLTK documentation, plus some books to help better.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.