Python NLTK method that returns a syntactic tree

Question

Python NLTK method that returns a syntactic tree

Asked 9 years, 8 months ago

Viewed 1,189 times

4

I’m using the NLTK Forest library and I saw there that has some sentences with parse (syntactic tree) already created. However, I would like a method that from a new phrase it creates the parse in English.

Examples are: Use today

floresta.parsed_sents()

and he brings me a tree set up for every sentence within the existing corpus. I would like to pass new sentences in English and some (s) python(s) function(s) return me the sentence with the parse equal to the function above returns.

I recently wrote a post with an example of how to use Syntaxnet (from Google), trained in Portuguese, to extract a syntactic tree from a sentence, and use this information with the structures of NLTK: http://davidsbatista.net/blog/2017/03/25/syntaxnet/

– David Batista

2017/06/18 at 11:01

3 answers

Browser other questions tagged python parser

You are not signed in. Login or sign up in order to post.

by mgibsonbr • **80,631** points · Answer 1 · 2015-11-29T21:04:19+00:00

I don’t know about "in Portuguese" - or even in any other natural language, like English - but from what I understood the parsed_sents returns a list of already "parsed" sentences, without specifying as this analysis was performed (automatically or manually, to serve as examples). To parse a new phrase, you need to use a grammar, and then use the method parse of this grammar. Example:

grammar1 = nltk.CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with"
  """)

This is a simple grammar with few rules and a restricted vocabulary. It can be used like this:

>>> sent = "Mary saw Bob".split()
>>> rd_parser = nltk.RecursiveDescentParser(grammar1)
>>> for tree in rd_parser.parse(sent):
...      print(tree)
(S (NP Mary) (VP (V saw) (NP Bob)))

^Source

The for is due to the possibility of there being two or more interpretations for the sentence, if it is ambiguous. Another example (only for use, for corresponding grammar, see link above):

>>> pdp = nltk.ProjectiveDependencyParser(groucho_dep_grammar)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> trees = pdp.parse(sent)
>>> for tree in trees:
...     print(tree)
(shot I (elephant an (in (pajamas my))))
(shot I (elephant an) (in (pajamas my)))

The way to use the code, therefore, is this. If there are good grammars for Portuguese that can be used in conjunction with this code (i.e. in a format accepted by this library), then I can’t say anymore - even because building a broad-scope grammar is a very difficult problem.

by z4r4tu5tr4 • 11 points · Answer 2 · 2015-12-29T21:11:06+00:00

The problem with trees in Portuguese is that it doesn’t have a tagger.

You can try to make a comparison between your text and the forest, but it’s still no guarantee that they’ll cover all your words.

You can also use the nltk.CFG.fromstring and mount your tree in hand, but if it is too complex it ends up falling into the tagger problem.

I don’t know the size of your need to create this, but if you want to contribute to the development of a tagger in English.

by David Batista • **119** points · Answer 3 · 2017-06-18T11:01:42+00:00

I recently wrote a post with an example of how to use Syntaxnet (from Google), trained for Portuguese, to extract a syntactic tree from a phrase, and use this information with NLTK structures:

http://davidsbatista.net/blog/2017/03/25/syntaxnet/