-1
The imported libraries:
import spacy
from spacy.matcher import Matcher
The following code adapted from the selected response of this https://stackoverflow.com/questions/62785916/spacy-replace-token :
nlp=spacy.load("pt_core_news_md")
doc=nlp("O João gosta da Maria.")
matcher = Matcher(nlp.vocab)
matcher.add("Maria", None, [{"LOWER": "Maria"}])
def replace_word(orig_text, replacement):
tok = nlp(orig_text)
text = ''
buffer_start = 0
for _, match_start, _ in matcher(tok):
if match_start > buffer_start:
text += tok[buffer_start: match_start].text + tok[match_start - 1].whitespace_
text += replacement + tok[match_start].whitespace_
buffer_start = match_start + 1
text += tok[buffer_start:].text
return text
print(replace_word("O João gosta da Maria.", "Ana"))
When printing this last line, the text did not suffer from any change (it should show "João likes Ana"). It will be because these Matcher functions only work for English and not for "pt_core_news_md"?
P.S.: Actually, I wanted there to be a modification in a token according to its index of the text where it is, rather than by condition (equal to a certain string).
Thank you very much!! Help so! Hug, Fernando
– fernando
Oops! Nice of you to help. If you have time, read this post
– Paulo Marques