Incorrectly created chart

Asked

Viewed 48 times

2

Hello,

I am training in creating graphics in Python with matplotlib. For this, I am importing an HTML with WEGE3 action history.

df_history = pd.read_html("https://br.financas.yahoo.com/quote/WEGE3.SA/history?p=WEGE3.SA")

I took out the duplicates (there were only 2):

df_remove = df_history[0].drop_duplicates(subset=['Data'])

And I took the last line:

yy = df_remove['Fechamento*'].drop(100)
xx = df_remove['Data'].drop(100)

I defined the Abels:

plt.xlabel('Data')
plt.ylabel('Preço')

The problem happens here: When I go to display the chart, it gets wrong. I increased the size just to see if the values of X and Y appeared correct on the chart and appear, but the drawing of the graph itself, is wrong:

f = plt.figure()
f.set_figwidth(28) 
f.set_figheight(28)
plt.plot(xx, yy) 
plt.show() 

This is the (wrong) graph that appears: Gráfico errado

To confirm if the graph is right or wrong, I took the same data and put in Excel.

This would be the correct chart:

Gráfico correto

I spent time trying to analyze but I couldn’t find where the problem is.

I don’t know if this information is relevant but, I am running this code on Google Colab.

Thank you.

  • 1

    Wouldn’t it be better to chart from the csv made available on the imported HTML page itself?

  • With CVS it worked. But as my goal is to train, I would like to understand my mistake using the same HTML :)

1 answer

0

What’s going on is that so much xx how much yy are filled with strings, instead of numerical or date values. So when you plot, matplotlib just seeing text. To resolve, just convert to the appropriate types:

  1. yy can be converted to a series of integers using the method to_numeric(). You also need to use the argument errors='coerce', because it has some non-available values that cannot be converted directly to integer.
  2. xx is with the date written in full, in Portuguese. Maybe it works in a program like excel (if it is in English), but to convert to a date format that python understands you would have to create a dedicated function. The alternative is to use the values as they are, there the function plot() will use them as Labels x-axis.

The code goes like this:

# define um tamanho maior para a figura, caso contrário fica impossível de ler os labels
plt.figure(figsize = (20,10))

# plot com os valores de xx no eixo x e yy convertido para inteiros no eixo y
plt.plot(xx, pd.to_numeric(yy, errors='coerce'))

# rotação de 60 graus dos labels para tornar legível
plt.xticks(rotation = 60)

# mostra a figura
plt.show()

Results in the following figure:

inserir a descrição da imagem aqui

Note that it was discontinued, referring to a day when it did not have the quotation available (the argument errors='coerce' caused him to be converted to Nan).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.