Strategy to run regressions with many iterations without much RAM

Asked

Viewed 33 times

1

I have a small database (872 Obs. of 27 variables).

But the analysis that I need to make of this base ends up being very heavy, because it requires analysis of the iteration of many variables among themselves.

I’m trying to accomplish a Confirmatory Factor Analysis (CFA) using the package Lavaan. However, the function stops running in the 4,521 interaction after a day of running. When I use Stata, the computer restarts at a certain time (about 10,000 iterations, if I’m not mistaken).

When it’s over (in R), I have a 200mb df and receive the following message on the console (which is the same as when I manually interrupt the operation):

Warning messages:
1: In lav_data_full(data = data, group = group, cluster = cluster,  :
  lavaan WARNING: some ordered categorical variable(s) have more than 12 levels: idade_coop n_pac membros cs_sobre_cooperados soma_pl_deposito ativocomp pl_sobre_ativos roa
2: In lav_samplestats_step2(UNI = FIT, wt = wt, ov.names = ov.names,  :
  lavaan WARNING: correlation between variables sul and sudeste is (nearly) 1.0
3: In lav_samplestats_step2(UNI = FIT, wt = wt, ov.names = ov.names,  :
  lavaan WARNING: correlation between variables ativocomp and soma_pl_deposito is (nearly) 1.0
4: In lav_model_estimate(lavmodel = lavmodel, lavpartable = lavpartable,  :
  lavaan WARNING: the optimizer warns that a solution has NOT been found!
5: In lav_model_estimate(lavmodel = lavmodel, lavpartable = lavpartable,  :
  lavaan WARNING: the optimizer warns that a solution has NOT been found!
6: In lav_model_estimate(lavmodel = lavmodel, lavpartable = lavpartable,  :
  lavaan WARNING: the optimizer warns that a solution has NOT been found!
7: In lav_model_estimate(lavmodel = lavmodel, lavpartable = lavpartable,  :
  lavaan WARNING: the optimizer warns that a solution has NOT been found!`

When I try to spin the summary, receiving: lavaan 0.6-7 did NOT end normally after 4521 iterations

I believe it is stopping running due to lack of memory, since in Stata, the computer simply restarts.

Example of the code I’m using:

# Biblioteca ----
library(tidyverse)
library(haven)
library(semPlot)
library(lavaan)

# importando base ----
base <- read_dta("base.dta")

# Rodando o CFA ----

# Atribuindo grupos
mod_cfa <- 'AIL =~ idade_coop + n_pac + sudeste + sul + centro + norte + nordeste
            CONS_SUP =~ reunioes_ano + estrutura_governanca + membros + comite
            ESTR_CAP =~ cs_sobre_cooperados + soma_pl_deposito + ativocomp + pl_sobre_ativos + roa'

# Rodando cfa
cfa_coop <- cfa(mod_cfa,
                data = base,
                missing = "default",
                estimator = "WLSMV",
                orthogonal = FALSE, 
                ordered = names(base)
)

# Resultados
summary(cfa_coop, standardized = T, fit.measures = T,  modindices = F)

fitMeasures(cfa_coop, c("chisq","df","pvalue","cfi","tli","rmsea"))

Example of the basis:

structure(list(cnpj = c("554656546", "767867868687", "132131232", 
"876768", "786765", "786575", "78678686", 
"65767568", "45678", "8675867"), niveis_superv = c("2", 
"2", "2", "0", "0", "0", "2", "2", "2", "0"), classe_bc = c("02", 
"02", "02", "01", "01", "02", "02", "02", "02", "01"), idade_coop = c(22, 
22, 22, 22, 22, 22, 22, 21, 21, 21), n_pac = c(1, 10, 11, 1, 
1, 3, 13, 4, 1, 1), sudeste = c(0, 1, 0, 1, 1, 0, 1, 1, 1, 1), 
    sul = c(0, 0, 1, 0, 0, 1, 0, 0, 0, 0), centro = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), nordeste = c(1, 0, 0, 0, 0, 0, 0, 0, 
    0, 0), norte = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), atuacao_regional = c(1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1), atuacao_estadual = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), atuacao_nacional = c(0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0), qtd_cooperados = c(1288, 3461, 11310, 1203, 
    4530, 3274, 7954, 3090, 983, 633), auditor = c("0", "1", 
    "1", "1", "1", "1", "1", "1", "1", "1"), contratar_auditoria_ind = c("2", 
    "2", "2", "2", "1", "2", "2", "2", "2", "2"), reunioes_ano = c(12, 
    12, 12, 12, 12, 12, 12, 12, 12, 12), estrutura_governanca = c("3", 
    "3", "3", "1", "2", "1", "3", "3", "2", "3"), membros = c(9, 
    7, 16, 6, 7, 9, 15, 7, 3, 3), comite = c("0", "0", "0", "1", 
    "0", "0", "0", "0", "0", "0"), cs_sobre_cooperados = c(6324.9228515625, 
    5602.01416015625, 6778.712890625, 790.086608886719, 1236.85620117188, 
    2393.3583984375, 6248.63232421875, 6032.5859375, 9310.8828125, 
    1582.30786132812), soma_pl_deposito = c(27017868, 75570352, 
    523851488, 1025653.1875, 6256179, 46703636, 409542080, 60845500, 
    10978892, 1100099.625), ativocomp = c(27371496, 143889792, 
    535524864, 1117028.25, 7135122.5, 63281840, 429233920, 93440432, 
    11219289, 1256903.25), pl_sobre_ativos = c(0.195353165268898, 
    0.0269169881939888, 0.0544663555920124, 0.611539125442505, 
    0.440605372190475, 0.0862450525164604, 0.0495623573660851, 
    0.0553100071847439, 0.432251751422882, 0.297396898269653), 
    roa = c(0.0260528121143579, 0.0159006342291832, 0.0089608347043395, 
    0.027274627238512, 0.0233467519283295, 0.00636459980159998, 
    0.0053424290381372, -0.0262128747999668, 0.0410496257245541, 
    0.0629174262285233), deposito_sobre_ativo = c(0.636883497238159, 
    0.341013759374619, 0.805505573749542, 0, 0, 0.594665348529816, 
    0.818691551685333, 0.405729651451111, 0.0734363198280334, 
    0), capital_social = c(8146501, 19388572, 76667240, 950474.1875, 
    5602958.5, 7835855, 49701624, 18640690, 9152598, 1001600.875
    )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

PS: I know you have similar questions, but they deal with when the base is large, not when regression requires a lot of ram.

1 answer

1


If you are on Linux, you can run htop in the terminal and track RAM memory consumption.

But what may be the problem for you is what is described in warning of lavaan:

lavaan WARNING: some ordered categorical variable(s) have more than 12 levels: idade_coop n_pac membros cs_sobre_cooperados soma_pl_deposito ativocomp pl_sobre_ativos roa
lavaan WARNING: correlation between variables sul and sudeste is (nearly) 1.0
lavaan WARNING: correlation between variables ativocomp and soma_pl_deposito is (nearly) 1.0

Try to readjust first by removing only the south or southeast variable. As well as remove ativocomp or soma_pl_deposit, because the correlation is close to 1.

If the problem still persists (take many iterations), can think of 2 alternatives:

  • Change the way variables enter the CFA model or structure.
  • Change the optimizer in lavOptions to the BFGS and see if the result improves (in my studies, the nlminb always works better than the BFGS). Ex.: cfa(your_model, data = your_data, optim.method = "BFGS")
  • How do I change to "BFGS"? When I put in option control, of error.

  • about checking the RAM memory, unfortunately I run on the server, which is windows.

  • 1

    There must be some option in windows to monitor RAM on the server, I do not know. To use the BFGS: cfa(model, data = HolzingerSwineford1939, optim.method = "BFGS"). Delete variables not solved?

  • Sorry it took so long! You see, he rode here! Thank you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.