Error creating column in Pandas dataset

Asked

Viewed 272 times

0

Hello,

I am creating a project in python using Pandas and I want to create a column whose values are the column Closed - Open, but there is an error that I cannot solve.

My code:

import pandas as pd

dataset = pd.read_csv(r'Documents\Projeto\PETR4.csv', sep=',')
dataset['Date'] = pd.to_datetime(dataset['Date'])
dataset['Variation'] = dataset['Close'].sub(dataset['Open'])

The Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-309e31139274> in <module>()
----> 1 dataset['Variation'] = dataset['Close'].sub(dataset['Open'])

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\ops.py in flex_wrapper(self, other, level, fill_value, axis)
   1049             self._get_axis_number(axis)
   1050         if isinstance(other, ABCSeries):
-> 1051             return self._binop(other, op, level=level, fill_value=fill_value)
   1052         elif isinstance(other, (np.ndarray, list, tuple)):
   1053             if len(other) != len(self):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py in _binop(self, other, func, level, fill_value)
   1598 
   1599         with np.errstate(all='ignore'):
-> 1600             result = func(this_vals, other_vals)
   1601         name = _maybe_match_name(self, other)
   1602         result = self._constructor(result, index=new_index, name=name)

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Example of table rows:

inserir a descrição da imagem aqui

Can you help me?

Thank you.

  • You can put some sample lines of PETR4.csv sff content?

  • I edited and put some lines.

  • As images don’t help at all, it makes us have to copy by hand to be able to see the solution... I’ll try to help anyway. Maybe by doing, pd.read_csv(r'Documents\Projeto\PETR4.csv', sep=',', parse_dates=['Date'])

  • Sorry, I didn’t know you would like the data itself. So the problem is between the Open and Close columns and not Date.

1 answer

3


You probably downloaded this data from Yahoo Finance. I did the same and here they are:

Date,Open,High,Low,Close,Adj Close,Volume
2010-01-04,36.950001,37.320000,36.820000,37.320000,33.627335,13303600
2010-01-05,37.380001,37.430000,36.799999,37.000000,33.339001,21396400
2010-01-06,36.799999,37.500000,36.799999,37.500000,33.789528,18720600
2010-01-07,37.270000,37.450001,37.070000,37.150002,33.474155,10964600
2010-01-08,37.160000,37.389999,36.860001,36.950001,33.293945,14624200

The error message says that the error is because these variables were read as string:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

I’m almost certain that the problem is that pandas are interpreting their numerical variables (Open,Close,etc) as strings because the decimal separator must be misspelled or because of some other error (NA, some dash, etc). That’s because your variable Volume doesn’t seem to be as numerical.

If it is an error in the base you have to search. Because I downloaded the csv from Yahoo Finance from PETR4 for all year 2010 and gave no problem.

But the easiest way to solve this is by using the option decimal. Supposing it is '.' and not 'comma,' you must write:

dataset = pd.read_csv(r'Documents\Projeto\PETR4.csv', sep=',', decimal='.')

If it is not enough also set the thousand separator using thousands = ',' or mode appropriate to what appears in csv.

If it still doesn’t work out, you can try other options:

  • dtype = {'Open': np.float64, 'Close': np.float64}
  • converters , you can pass a dictionary of functions that clear the variables depending on the case

Observing Usually what is used as price variation is the difference of the current closing price with the closing price of the previous period. If this is your case, you can do

  • To calculate the daily rate of return:

    dataset['Variation'] = dataset['Close'].pct_change()

  • For daily return (in Real):

    dataset['Variation'] = dataset['Close'].sub(dataset['Close'].shift(1))

  • 1

    Dude, that’s exactly what it was. Thank you so much!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.