1
Good afternoon,
I’m using the Gem Rb-libsvm:
In order to generate a template for tweeting (positive/negative).
From a database with 19000 positive tweets and 19000 negative tweets I tried to create the template with this code (which is an adaptation of the usage example that is on github):
require 'libsvm'
path = 'pos.txt'
documents = IO.readlines(path).map do |line|
[1,line.tr("\n","")]
end
path2 = 'neg.txt'
documents += IO.readlines(path).map do |line|
[0,line.tr("\n","")]
end
# Lets create a dictionary of unique words and then we can
# create our vectors. This is a very simple example. If you
# were doing this in a production system you'd do things like
# stemming and removing all punctuation (in a less casual way).
#
dictionary = documents.map(&:last).map(&:split).flatten.uniq
dictionary = dictionary.map { |x| x.gsub(/\?|,|\.|\-/,'') }
training_set = []
documents.each do |doc|
features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
training_set << [doc.first, Libsvm::Node.features(features_array)]
end
# Lets set up libsvm so that we can test our prediction
# using the test set
#
problem = Libsvm::Problem.new
parameter = Libsvm::SvmParameter.new
parameter.cache_size = 1 # in megabytes
parameter.eps = 0.001
parameter.c = 10
# Train classifier using training set
#
problem.set_examples(training_set.map(&:first),training_set.map(&:last))
model = Libsvm::Model.train(problem, parameter)
model.save("ic.model")
When I use a smaller amount of tweets to do the training (2000 for example) I have no problem, but when I try to do with the 38000 tweets the following error occurs:
FATAL failed to allocate memory
The error occurs in the following code snippet:
documents.each do |doc|
features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
training_set << [doc.first, Libsvm::Node.features(features_array)]
end
I’m new to Ruby and I don’t understand why this problem, someone could help?
Unrelated to the problem, you are reading the file in "path" twice, when you probably want to read it once in "path" and another in "path2".
– fsanches