0
I have a collection on Mongodb of Tweets, these records have a field called text, in this field I need to delete records that have the same value besides removing single quotes, doubles, commas and line breaks. For the removal of data with duplicated text field I am trying as follows:
var registro;
db.getCollection('TweetsBR_1_copy').find().forEach( function(myDoc) {
db.getCollection('TweetsBR_1_copy').find({"text": myDoc.text}).forEach( function(myDoc_2) {
registro = db.getCollection('TweetsBR_1_copy').findOne({text:myDoc_2.text})
db.getCollection('TweetsBR_1_copy').remove(registro)
print("registro excluido:")
print(registro.text)
});
db.getCollection('TweetsBR_1_copy').insert(registro)
print("registro inserido:")
print(registro.text)
});
But I am noticing that every time I run the command it deletes more and more records, so I’m not sure it’s working properly.
The collection has around 500K
of records.
Can someone give me a hand in this matter?
Thank you.
Thank you so much for the answer! As soon as I have a free time I will test. Vlw!
– Thyago Oliveira Pereira