Memory allocation error for multiple files "terminate called after Throwing an instance of 'Std::bad_alloc' what(): Std::bad_alloc" [C++]

Asked

Viewed 505 times

1

I’m using a sorting algorithm for a digital voice signal processing project. This algorithm was developed to receive all audio signals in a single vector to do the processing, but I’m having problems, because the amount of files I’m working on is very large and is generating the error "terminate called after Throwing an instance of 'Std::bad_alloc' what(): Std::bad_alloc". I wonder if it is possible to make any changes in the code that reads the files and stores them in the vector more efficiently, without exceeding the available memory space.

Code for reading files:

    string filename;
    filename="C:\\Users\\marcu\\Desktop\\TCC\\Arquivos_10780\\Arquivos_DFT_TXT_512\\PA_D_00";

    std::vector<double> c;

    for(int j=1; j<=5400; j++)
    {
        stringstream ss;
        ss << filename << setw(5) << setfill('0') << j << "_bonafide_DFT.txt";

        std::ifstream f;
        f.open(ss.str().c_str());

        if (f.is_open())
        {
            double num;

            while (f >> num)
                c.push_back(num);

            f.close();
        }
        else
        {
            f.close();
            continue;
        }
    }

    for(int j=5401; j<=29700; j++)
    {
        stringstream ss;
        ss << filename << setw(5) << setfill('0') << j << "_spoof_DFT.txt";

        std::ifstream f;
        f.open(ss.str().c_str());

        if (f.is_open())
        {
            double num;

            while (f >> num)
                c.push_back(num);

            f.close();
        }
        else
        {
            f.close();
            continue;
        }
    }

Complete code:

#include<stdio.h>
#include<math.h>
#include<string.h>
#include<iostream>
#include<fstream>
#include<string>
#include<vector>
#include<stdlib.h>
#include<iomanip>
#include<sstream>

using namespace std;

double mean_similarities(double**,int,int);//vectors, number of vectors, their dimension

int main()
{
    const int number_of_classes=2;
    int number_of_feature_vectors_in_class[number_of_classes];
    number_of_feature_vectors_in_class[0]=2700;
    number_of_feature_vectors_in_class[1]=8080;
    const int dimension_of_each_feature_vector=512;

////////////////////////////////////////////////////////////////////////////////////////////
/*
Example: 3 classes and 4 vectors of dimension 2 in each class
{{0.90,0.12},{0.88,0.14},{0.88,0.13},{0.89,0.11}}   //0.88---0.90 ; 0.11---0.14
{{0.55,0.53},{0.53,0.55},{0.54,0.54},{0.56,0.54}}   //0.53---0.56 ; 0.53---0.55
{{0.10,0.88},{0.11,0.86},{0.12,0.87},{0.11,0.88}}   //0.10---0.12 ; 0.86---0.88  

double c[]={ 
0.90,0.12,0.88,0.14,0.88,0.13,0.89,0.11,
0.55,0.53,0.53,0.55,0.54,0.54,0.56,0.54,
0.10,0.88,0.11,0.86,0.12,0.87,0.11,0.88
//all vectors in class C_1, followed by all vectors in C_2, ...., followed by all in C_n
            };
*/
////////////////////////////////////////////////////////////////////////////////////////////

    string filename;
    filename="C:\\Users\\marcu\\Desktop\\TCC\\Arquivos_10780\\Arquivos_DFT_TXT_512\\PA_D_00";

    std::vector<double> c;

    for(int j=1; j<=5400; j++)
    {
        stringstream ss;
        ss << filename << setw(5) << setfill('0') << j << "_bonafide_DFT.txt";

        std::ifstream f;
        f.open(ss.str().c_str());

        if (f.is_open())
        {
            double num;

            while (f >> num)
                c.push_back(num);

            f.close();
        }
        else
        {
            f.close();
            continue;
        }
    }

    for(int j=5401; j<=29700; j++)
    {
        stringstream ss;
        ss << filename << setw(5) << setfill('0') << j << "_spoof_DFT.txt";

        std::ifstream f;
        f.open(ss.str().c_str());

        if (f.is_open())
        {
            double num;

            while (f >> num)
                c.push_back(num);

            f.close();
        }
        else
        {
            f.close();
            continue;
        }
    }

////////////////////////////////////////////////////////////////////////////////////////////
//edit whatever you need, according to the feature vectors of your problem, ABOVE this line.
//Do NOT change anything BELOW this line !!!!!
////////////////////////////////////////////////////////////////////////////////////////////
    double*** C=new double**[number_of_classes];
    for(int i=0; i<number_of_classes; i++)
        C[i]=new double*[number_of_feature_vectors_in_class[i]];
    for(int i=0; i<number_of_classes; i++)
        for(int j=0; j<number_of_feature_vectors_in_class[i]; j++)
            C[i][j]=new double[dimension_of_each_feature_vector];
    int l=0;
    for(int i=0; i<number_of_classes; i++)
        for(int j=0; j<number_of_feature_vectors_in_class[i]; j++)
            for(int k=0; k<dimension_of_each_feature_vector; k++)
            {
                C[i][j][k]=c[l];
                l++;
            }

//Debug info only
//for(int i=0;i<number_of_classes;i++)
//  for(int j=0;j<number_of_feature_vectors_in_class[i];j++)
//      for(int k=0;k<dimension_of_each_feature_vector;k++)
//          printf("\nclass %d vector %d element %d is %.3f",i,j,k,C[i][j][k]);
//getchar();
    double Y[number_of_classes];
    for(int i=0; i<number_of_classes; i++)
        Y[i]=mean_similarities(C[i],number_of_feature_vectors_in_class[i],dimension_of_each_feature_vector);
    double alpha=Y[0];
    for(int i=1; i<number_of_classes; i++)
        if(Y[i]<alpha)
            alpha=Y[i];
    printf("\nALPHA: %.3f",alpha);
    double** smallest_range_vector_for_class=new double*[number_of_classes];
    for(int i=0; i<number_of_classes; i++)
        smallest_range_vector_for_class[i]=new double[dimension_of_each_feature_vector];
    for(int i=0; i<number_of_classes; i++)
        for(int k=0; k<dimension_of_each_feature_vector; k++)
            smallest_range_vector_for_class[i][k]=C[i][0][k];
    for(int i=0; i<number_of_classes; i++)
        for(int j=1; j<number_of_feature_vectors_in_class[i]; j++)
            for(int k=0; k<dimension_of_each_feature_vector; k++)
                if(C[i][j][k]<smallest_range_vector_for_class[i][k])
                    smallest_range_vector_for_class[i][k]=C[i][j][k];

//Debug info only
//for(int i=0;i<number_of_classes;i++)
//  for(int k=0;k<dimension_of_each_feature_vector;k++)
//          printf("\nclass %d smallest component %d is %.3f",i,k,smallest_range_vector_for_class[i][k]);
    double** largest_range_vector_for_class=new double*[number_of_classes];
    for(int i=0; i<number_of_classes; i++)
        largest_range_vector_for_class[i]=new double[dimension_of_each_feature_vector];
    for(int i=0; i<number_of_classes; i++)
        for(int k=0; k<dimension_of_each_feature_vector; k++)
            largest_range_vector_for_class[i][k]=C[i][0][k];
    for(int i=0; i<number_of_classes; i++)
        for(int j=1; j<number_of_feature_vectors_in_class[i]; j++)
            for(int k=0; k<dimension_of_each_feature_vector; k++)
                if(C[i][j][k]>largest_range_vector_for_class[i][k])
                    largest_range_vector_for_class[i][k]=C[i][j][k];

//Debug info only
//for(int i=0;i<number_of_classes;i++)
//  for(int k=0;k<dimension_of_each_feature_vector;k++)
//          printf("\nclass %d largest component %d is %.3f",i,k,largest_range_vector_for_class[i][k]);
    int R=0;
    int F=0;
    for(int ia=0; ia<number_of_classes; ia++)
        for(int ib=0; ib<number_of_classes; ib++)
            for(int j=0; j<number_of_feature_vectors_in_class[ib]; j++)
                for(int k=0; k<dimension_of_each_feature_vector; k++)
                {
                    if(ib!=ia)
                    {
                        if((C[ib][j][k]>smallest_range_vector_for_class[ia][k])&&(C[ib][j][k]<largest_range_vector_for_class[ia][k]))
                            R++;
                        F++;
                    }
                }
    double beta=((double)(R))/((double)(F));
    printf("\nBETA: %.3f",beta);
    printf("\nP=(G1,G2)=(%.3f,%.3f)",alpha-beta,alpha+beta-1);
    printf("\nDistance from P to (1,0): %.3f",sqrt(pow((alpha-beta)-1,2)+pow(alpha+beta-1,2)));
    printf("\n\n");
}
/////////////////////////////////////////////
////////////////////////////////////////////
double mean_similarities(double** v,int n, int t)
{
    double largest;
    double smallest;
    double* s=new double[t];
    for(int i=0; i<t; i++)
    {
        smallest=1;
        largest=0;
        for(int j=0; j<n; j++)
        {
            if(v[j][i]>largest)
                largest=v[j][i];
            if(v[j][i]<smallest)
                smallest=v[j][i];
        }
        s[i]=1-(largest-smallest);
    }
    double m=0;
    for(int i=0; i<t; i++)
        m+=s[i];
    m/=((double)(t));
    return(m);
}

PS: To find the best result of the classifier, I need to change the size of the files (amount of information of each file) to larger and larger values. At first with 512 points, but I double this value with each execution until reaching 8192, but when I try with 16384 the code hangs. I am working with 10780 files where each one has the same dimension and I am thinking as I check the result.

  • How much memory do you expect the program to use? If it is too much to implement a custom allocator might be a good idea.

  • vc recommends some documentation or library that I can see to implement?

1 answer

0

Something easy you can try is to use vector floats instead of vector doubles to store the values. This will cut in half the memory your vector uses and it may be that the loss of accuracy is not a big problem in your case.

Another important point is to reserve memory for the vector before doing the push_back using c.reserve(numero_de_elementos). Do you know how many elements the vector will have at the end? If yes reserve the space before to make push_back. Every time you make one push_back if the memory currently allocated to the vector does not fit the new element then the std::vector will allocate a new memory region twice the capacity of the current one (at that moment your program will be using three times the memory needed only for this vector), copy the elements of the old memory, and then release the old memory. If you don’t reserve memory then this process will probably happen several times as you use push_back.

Also, even if you have free memory so you can use three times what you need, it may be that your free memory is distributed and it is not possible to allocate a continuous region of memory during this relocation process std::vector if the vector has many elements.

  • I used . reserve, but I’m still having memory burst in the middle of the process. Doing some tests, I saw that the processing went further than before, but it is still insufficient. In another comment they talked about a custom allocator, you know something about??

  • If I’m not mistaken the motivation to use custom allocator is usually for performance, when you make many allocations, but never used one. I don’t know if he’d really help your case. When you made the reservation you are sure you reserved for the number of elements that the vector will have at the end and not some smaller value?

  • Correct me if I’m wrong. I’m reading 10780 files with 16384 elements each. In this case, I need to allocate 10780 * 16384 = 176619520 or need more space?

  • In the middle of the code, there is a comment with "Do NOT change Anything BELOW this line !!!!!" This is from the developer of the algorithm, but if you have any hint that can better handle the next steps of the code without changing the result, tbm would help

  • Do you need to process all the files together? Couldn’t you process one file at a time? Or one group of files at a time? Also, the "code that should not be changed" contains several "new"s and has no delete (as well as the function mean_similarities). Who should take care of deleting all this allocated memory with so many "new"s? It is a good idea to try to use this manual memory management and exchange for unique_ptr, for example.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.