Explanation
The model bag-of-words is a simplified representation used in natural language processing and in the information retrieval (IR). In this template, a text (such as a sentence or a document) is represented as the bag (Multiset) of your words, disregarding the grammar and even the order of the words, but maintaining the multiplicity.
Example of Implementation
The following templates are a text document using bag-of-words.
Here are two simple text documents:
(1) John gosta de assistir filmes. Mary também gosta de filmes.
(2) John também gosta de assistir jogos de futebol.
Based on these two text documents, a list is constructed as follows:
[
"John" ,
"gosta" ,
"de" ,
"assistir" ,
"filmes" ,
"Mary" ,
"também" ,
"futebol" ,
"jogos"
]
It is also common to calculate the frequency of appearance of words:
linear(tj) = 1 − d(tj)/N
Where tj
is the word you want to find the frequency, d(tj)
the number of times the word appears, and N
is the amount of documents or phrases.
Completion
In a simple way, the bag-of-words is a form of text representation. And is commonly used for machine Learning, sentiment analysis, chatbot and topic model.
Source: Wikipedia
bag
is a mathematical concept. Something likeconjunto, porém pode repetir elementos e ignora ordenação
– Jefferson Quesado
Have any of these articles with easy link to help contextualize the use of this
bag
?– Jefferson Quesado
Unfortunately not :/
– Francisco
I didn’t quite understand your explanation, could you give examples of cases that apply? @Jeffersonquesado
– Francisco
Ensembles:
{0, 1, 2} U {2, 4} = {0, 1, 2, 4}
; bags:{0, 1, 2} U {2, 4} = {0, 1, 2, 2, 4}
; also bags:{0, 1, 3, 1, 2} - {0, 1} = {3, 1, 2}
– Jefferson Quesado
For real-world applications, see https://en.wikipedia.org/wiki/Bag_(Mathematics)? wprov=sfsi1
– Jefferson Quesado
@Jeffersonquesado Legal, thanks for showing me a way. It does seem to have a relationship with the term "bag of words", but it’s still not what I wanted.
– Francisco
Google Academic returns something to "bag of words"?
– Jefferson Quesado
I accidentally found this: https://en.m.wikipedia.org/wiki/Bag-of-words_model; seems relevant
– Jefferson Quesado
It seems that the third section of this report describes something about "bag of words": http://conteudo.icmc.usp.br/CMS/Arquivos/arquivos_enviados/BIBLIOTECA_113_RT_209.pdf
– Jefferson Quesado
@Jeffersonquesado I didn’t know there was such a Google Scholar, lol. This Wikipedia has some interesting things, but I still haven’t found what I wanted...
– Francisco
The technical report that I sent the link deposed from the Wiki link, was it more direct to the subject? There he speaks in document classification and PLN using "bag of words", including has a table there of example documents with the words "cas", "filh" and others
– Jefferson Quesado
@Jeffersonquesado I think I got where I wanted, I published an answer, I think it illustrates the subject well.
– Francisco
good response =)
– Jefferson Quesado