Memoryerror on the pandas

Asked

Viewed 568 times

1

Hello I am using the pandas merge command in python3:

ibama_doadores_orig = pd.merge(eleitos_d_s_doadores, ibama, left_on='CPF_CNPJ_doador_originario_limpo', right_on='CPF_CNPJ_limpo')

But a message from Memoryerror appears:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-20-e7b779815ee8> in <module>()
----> 1 ibama_doadores_orig = pd.merge(eleitos_d_s_doadores, ibama, left_on='CPF_CNPJ_doador_originario_limpo', right_on='CPF_CNPJ_limpo')

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)
     52                          right_index=right_index, sort=sort, suffixes=suffixes,
     53                          copy=copy, indicator=indicator)
---> 54     return op.get_result()
     55 
     56 

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\reshape\merge.py in get_result(self)
    581             [(ldata, lindexers), (rdata, rindexers)],
    582             axes=[llabels.append(rlabels), join_index],
--> 583             concat_axis=0, copy=self.copy)
    584 
    585         typ = self.left._constructor

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4830     blocks = [make_block(
   4831         concatenate_join_units(join_units, concat_axis, copy=copy),
-> 4832         placement=placement) for placement, join_units in concat_plan]
   4833 
   4834     return BlockManager(blocks, axes)

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in <listcomp>(.0)
   4830     blocks = [make_block(
   4831         concatenate_join_units(join_units, concat_axis, copy=copy),
-> 4832         placement=placement) for placement, join_units in concat_plan]
   4833 
   4834     return BlockManager(blocks, axes)

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in concatenate_join_units(join_units, concat_axis, copy)
   4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4938                                          upcasted_na=upcasted_na)
-> 4939                  for ju in join_units]
   4940 
   4941     if len(to_concat) == 1:

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in <listcomp>(.0)
   4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4938                                          upcasted_na=upcasted_na)
-> 4939                  for ju in join_units]
   4940 
   4941     if len(to_concat) == 1:

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
   5239             for ax, indexer in self.indexers.items():
   5240                 values = algos.take_nd(values, indexer, axis=ax,
-> 5241                                        fill_value=fill_value)
   5242 
   5243         return values

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\algorithms.py in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
   1465             out = np.empty(out_shape, dtype=dtype, order='F')
   1466         else:
-> 1467             out = np.empty(out_shape, dtype=dtype)
   1468 
   1469     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype, axis=axis,

MemoryError: 

The file I intend to generate will have 26 columns. Please, is there any way to avoid memory error? Or do I need to merge with a few columns?

3 answers

1

Memoryerror in pandas happens when you try to carry in memory a very large dataframe. Try to break your processing into Chunks of dataframes and concatenate them later!

-1

For Yansym is correct... A few days ago I had the same problem and broke the processing in Chunks of dataframes you can do this using chunksize argument within pandas.read_csv

Practical example

#

import pandas

result = None for Chunk in pandas.read_csv("voters.csv", chunksize=1000)

#

In the above case each data.frame is 1000 lines of the file in CSV

Hugs!

-1

Is your Python version x32? If so, this may be the problem. The system cannot allocate enough memory for the variables. I would recommend using the x64 version which is the most efficient for data processing.

Link to the download: https://www.python.org/downloads/release/python-391/

To install the 64bit version you must scroll down the page and select "Windows Installer (64-Bit)"

Browser other questions tagged

You are not signed in. Login or sign up in order to post.