FATAL failed to allocate memory

Asked

Viewed 74 times

1

Good afternoon,

I’m using the Gem Rb-libsvm:

https://rubygems.org/gems/rb-libsvm/versions/1.3.1

In order to generate a template for tweeting (positive/negative).

From a database with 19000 positive tweets and 19000 negative tweets I tried to create the template with this code (which is an adaptation of the usage example that is on github):

require 'libsvm'

path = 'pos.txt'
documents = IO.readlines(path).map do |line|
  [1,line.tr("\n","")]
end

path2 = 'neg.txt'
documents += IO.readlines(path).map do |line|
  [0,line.tr("\n","")]
end


# Lets create a dictionary of unique words and then we can
# create our vectors.  This is a very simple example.  If you
# were doing this in a production system you'd do things like
# stemming and removing all punctuation (in a less casual way).
#
dictionary = documents.map(&:last).map(&:split).flatten.uniq
dictionary = dictionary.map { |x| x.gsub(/\?|,|\.|\-/,'') }
training_set = []
documents.each do |doc|
  features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
  training_set << [doc.first, Libsvm::Node.features(features_array)]
end

# Lets set up libsvm so that we can test our prediction
# using the test set
#
problem = Libsvm::Problem.new
parameter = Libsvm::SvmParameter.new

parameter.cache_size = 1 # in megabytes
parameter.eps = 0.001
parameter.c   = 10

# Train classifier using training set
#
problem.set_examples(training_set.map(&:first),training_set.map(&:last))
model = Libsvm::Model.train(problem, parameter)
model.save("ic.model")

When I use a smaller amount of tweets to do the training (2000 for example) I have no problem, but when I try to do with the 38000 tweets the following error occurs:

FATAL failed to allocate memory

The error occurs in the following code snippet:

documents.each do |doc|
  features_array = dictionary.map { |x| doc.last.include?(x) ? 1 : 0 }
  training_set << [doc.first, Libsvm::Node.features(features_array)]
end

I’m new to Ruby and I don’t understand why this problem, someone could help?

  • Unrelated to the problem, you are reading the file in "path" twice, when you probably want to read it once in "path" and another in "path2".

2 answers

1

Apparently you are without memory. You are keeping in memory at the same time:

  • A String Documents;
  • The Hash Dictionary;
  • The features_array Array;
  • The Array training_set;

By the error line you know you managed to allocate Documents and Dictionary, lacked space for features_array and training_set.

If possible (it is difficult to judge without knowing exactly what the methods you are calling do), try to restructure the code to use a pipe structure.

Instead of trying to mount these in memory, go to . set_examples during the loop you do in Documents.

Another option is to rewrite this loop using Documents.pop, so you delete the array elements and free up memory.

The ideal would be to be able to read and treat one tweet at a time, more refactoring and do not know if it is possible in your case.

1

The problem is that the pc where you are running this code does not have enough memory to run Libsvm with 38000 tweets.

Try reducing the amount of tweets.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.