Memoryerror on the pandas


Hello I am using the pandas merge command in python3:

ibama_doadores_orig = pd.merge(eleitos_d_s_doadores, ibama, left_on='CPF_CNPJ_doador_originario_limpo', right_on='CPF_CNPJ_limpo')

But a message from Memoryerror appears:

MemoryError                               Traceback (most recent call last)
<ipython-input-20-e7b779815ee8> in <module>()
----> 1 ibama_doadores_orig = pd.merge(eleitos_d_s_doadores, ibama, left_on='CPF_CNPJ_doador_originario_limpo', right_on='CPF_CNPJ_limpo')

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\reshape\ in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)
     52                          right_index=right_index, sort=sort, suffixes=suffixes,
     53                          copy=copy, indicator=indicator)
---> 54     return op.get_result()

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\reshape\ in get_result(self)
    581             [(ldata, lindexers), (rdata, rindexers)],
    582             axes=[llabels.append(rlabels), join_index],
--> 583             concat_axis=0, copy=self.copy)
    585         typ = self.left._constructor

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\ in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4830     blocks = [make_block(
   4831         concatenate_join_units(join_units, concat_axis, copy=copy),
-> 4832         placement=placement) for placement, join_units in concat_plan]
   4834     return BlockManager(blocks, axes)

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\ in <listcomp>(.0)
   4830     blocks = [make_block(
   4831         concatenate_join_units(join_units, concat_axis, copy=copy),
-> 4832         placement=placement) for placement, join_units in concat_plan]
   4834     return BlockManager(blocks, axes)

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\ in concatenate_join_units(join_units, concat_axis, copy)
   4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4938                                          upcasted_na=upcasted_na)
-> 4939                  for ju in join_units]
   4941     if len(to_concat) == 1:

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\ in <listcomp>(.0)
   4937     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4938                                          upcasted_na=upcasted_na)
-> 4939                  for ju in join_units]
   4941     if len(to_concat) == 1:

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\ in get_reindexed_values(self, empty_dtype, upcasted_na)
   5239             for ax, indexer in self.indexers.items():
   5240                 values = algos.take_nd(values, indexer, axis=ax,
-> 5241                                        fill_value=fill_value)
   5243         return values

c:\users\george\appdata\local\programs\python\python36-32\code\doacoes\lib\site-packages\pandas\core\ in take_nd(arr, indexer, axis, out, fill_value, mask_info, allow_fill)
   1465             out = np.empty(out_shape, dtype=dtype, order='F')
   1466         else:
-> 1467             out = np.empty(out_shape, dtype=dtype)
   1469     func = _get_take_nd_function(arr.ndim, arr.dtype, out.dtype, axis=axis,


The file I intend to generate will have 26 columns. Please, is there any way to avoid memory error? Or do I need to merge with a few columns?

Memoryerror in pandas happens when you try to carry in memory a very large dataframe. Try to break your processing into Chunks of dataframes and concatenate them later!


For Yansym is correct... A few days ago I had the same problem and broke the processing in Chunks of dataframes you can do this using chunksize argument within pandas.read_csv

Practical example


import pandas

result = None for Chunk in pandas.read_csv("voters.csv", chunksize=1000)


In the above case each data.frame is 1000 lines of the file in CSV



Is your Python version x32? If so, this may be the problem. The system cannot allocate enough memory for the variables. I would recommend using the x64 version which is the most efficient for data processing.

Link to the download:

To install the 64bit version you must scroll down the page and select "Windows Installer (64-Bit)"

