I don’t know about "in Portuguese" - or even in any other natural language, like English - but from what I understood the parsed_sents
returns a list of already "parsed" sentences, without specifying as this analysis was performed (automatically or manually, to serve as examples). To parse a new phrase, you need to use a grammar, and then use the method parse
of this grammar. Example:
grammar1 = nltk.CFG.fromstring("""
S -> NP VP
VP -> V NP | V NP PP
PP -> P NP
V -> "saw" | "ate" | "walked"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "man" | "dog" | "cat" | "telescope" | "park"
P -> "in" | "on" | "by" | "with"
""")
This is a simple grammar with few rules and a restricted vocabulary. It can be used like this:
>>> sent = "Mary saw Bob".split()
>>> rd_parser = nltk.RecursiveDescentParser(grammar1)
>>> for tree in rd_parser.parse(sent):
... print(tree)
(S (NP Mary) (VP (V saw) (NP Bob)))
Source
The for
is due to the possibility of there being two or more interpretations for the sentence, if it is ambiguous. Another example (only for use, for corresponding grammar, see link above):
>>> pdp = nltk.ProjectiveDependencyParser(groucho_dep_grammar)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> trees = pdp.parse(sent)
>>> for tree in trees:
... print(tree)
(shot I (elephant an (in (pajamas my))))
(shot I (elephant an) (in (pajamas my)))
The way to use the code, therefore, is this. If there are good grammars for Portuguese that can be used in conjunction with this code (i.e. in a format accepted by this library), then I can’t say anymore - even because building a broad-scope grammar is a very difficult problem.
I recently wrote a post with an example of how to use Syntaxnet (from Google), trained in Portuguese, to extract a syntactic tree from a sentence, and use this information with the structures of NLTK: http://davidsbatista.net/blog/2017/03/25/syntaxnet/
– David Batista