WEKA reads a file in format ARFF.
To create an arff file, you must define the following headers:
Declaration of Relationship
A name for the relation, defined in the first line of the file. It is declared:
@relation <nome da relacao>
If the relation name contains spaces, quotation marks should be used.
Attribute Statement
Attributes are declared through an ordered sequence of @attributes
. Each attribute in the dataset must have its own statement using @attribute
that identifies solely the name of this attribute and the data type. The order in which they are declared indicates the order in which they appear in the data set.
Declares himself:
@attribute <nome do atributo> <tipo de dado>
Attribute name must start with letter and, if it contains spaces, must be in quotes.
The data types supported by WEKA are:
- Numbers (actual or integer): Numeric
- Text "free": String
- Nominal attributes (default text)
- Date: Date [<date-format>]
- Relational attributes
Numerical attributes
It is suitable for both integers and reals. It declares itself:
@attribute idade numeric
Nominal attributes
Nominal values are defined when a list of possible values is provided. For example:
@attribute classe {comprador, possivel-comprador, nao-comprador}
Attributes of type String
Used for arbitrary texts. Declares:
@attribute tweet string
Note: should be in quotes if it contains spaces.
Declaration of the dataset
The data set is declared on a single line. It is declared:
@data
Delimits where instance data actually begins.
Instance data
Instance data is declared one per line and the attributes must be separated with comma.
By directly answering your question, a possible configuration of an ARFF file for your problem would be like this:
% Tudo depois do % é ignorado. Pode-se utilizar para inserir comentários
@relation compradores
@attribute tweet string
@attribute classe {compraria, nao-compraria}
@data
"To e morto Galaxy S5 por R$ 2,600", nao-compraria
"Preciso de um galaxy s5", compraria
"Configurando meu Galaxy s5", compraria
"Prefiro um iphone do que um galaxy s5", nao-compraria
Just rewrite your base in format ARFF. Possibly with a small script.
– Beterraba
I understand, however I’m trying to think about how this arff file would look, you would have an idea of what the sketch of this file would look like?
– Maicon Funke
Post a little piece of your tweet base (it may only be 1) that I reply to you
– Beterraba
"To and dead Galaxy S5 for R $ 2,600" -- "I need a Galaxy S5"
– Maicon Funke
In these examples above, I have 2 tweets, one that demonstrates that the user would not buy and the other showing a tweet of a potential interest.
– Maicon Funke