Using Perl (sorry...) but easy to translate to awk
$ cat stoplist.txt
de
a
o ....
$ cat ex.cvs
meu caro amigo;jjoao;classe a
eu ando a aprender weka;Thyago;classe b
mas a sua sintaxe dá-me algumas dores de cabeça;Thyago;classe a
Let rmstopwords be the following Perl file:
BEGIN{ $patt="que"; ## contruir uma regexp reg com as palavras
open(G,"stoplist.txt");
while(<G>){chomp;
$patt.="|$_" if $_ ## patt="que|de|a|o|..."
}
}
$F[0] =~ s/\b($patt)\b//g; ## no primeiro campo, subst(patt por "")
print join(";",@F)
Applied to our file ex.csv
gives:
$ perl -naC -F';' rmovestopwords t.cvs
caro amigo;jjoao;classe a
ando aprender weka;Thyago;classe b
sintaxe dá- algumas dores cabeça;Thyago;classe a
You want to delete only the word or take out the whole line?
– JuniorNunes
Just the word itself.
– Thyago Oliveira Pereira
I’m afraid how to put some records of your CSV to see how the pattern is?
– JuniorNunes
Man, sorry it took me so long to reply, I’m extremely busy these days, I’ll see if I put a snippet of the file today. Thanks!
– Thyago Oliveira Pereira