Indexing error when separating line information

Asked

Viewed 159 times

0

import pandas as pd 
import numpy as np 
import matplotlib as plt

df = pd.read_csv('dito_julho.csv')
df.head()

             campanha                           valor
1            Prospect | 5 dias | Com crédito       2
2            Prospect | 5 dias | Com crédito       5
3            Prospect | 5 dias | Com crédito       7 

So I try to create a new column with the second information of each row of column 1, ie I want to get the "5 Days"

df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])

However, it gives the error below:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-180-57ecc844181a> in <module>()
----> 1 df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])

c:\users\iuri\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-180-57ecc844181a> in <lambda>(x)
----> 1 df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])

IndexError: list index out of range

If I try to do with the first field, which is the Prospect, it works:

df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])
df_teste.head()
>>>>>>>
0    Prospect 
1    Prospect 
2    Prospect 
3    Prospect 
4    Prospect 

Does anyone have any hint as to why I can’t get this information?

If I do a test, creating something like this:

df_teste = df['Segmento'].apply(lambda x: x.split("|"))
df_teste.head()

>>>>>

0     [Prospect ,  5 dias ,  Com crédito]
1    [Prospect ,  20 dias ,  Com crédito]
2    [Prospect ,  40 dias ,  Com crédito]
3    [Prospect ,  75 dias ,  Com crédito]
4     [Prospect ,  5 dias ,  Sem crédito]

It is clear that could take the information, 1, the days, but this does not occur.

Could someone help me?

  • Are you trying to use the right column? In your example the column calls "campaign". And this apply works properly.

  • In both examples you put [1], in the second would not be 0?

  • Are you sure all the lines have |? Just one of your csv won’t have to break the code.

  • You killed the charade @Begnini some lines don’t have that, I managed to solve by listing only those that have the "|"

2 answers

1


One solution is to use:

def cria_colunas(string_campanha):
    lista = string_campanha.split("|")
    if len(lista) == 0:
        return '', '', ''
    elif len(lista) == 1:
        return lista[0], '', ''
    elif len(lista) == 2:
        return lista[0], lista[1], ''
    elif len(lista) == 3:
        return lista[0], lista[1], lista[2]

df['Ação'], df['Prazo'], df['Crédito'] = df['campanha'].apply(cria_colunas)

Or else:

def cria_acao(string_campanha):
    try:
        return string_campanha.split("|")[0]
    except:
        return ''

def cria_prazo(string_campanha):
    try:
        return string_campanha.split("|")[1]
    except:
        return ''

def cria_credito(string_campanha):
    try:
        return string_campanha.split("|")[2]
    except:
        return ''

df['Ação'] = df['campanha'].apply(cria_acao)
df['Prazo'] = df['campanha'].apply(cria_prazo)
df['Crédito'] = df['campanha'].apply(cria_credito)

That solves the problem, but I don’t think it’s the best way.

  • Thank you very much my friend! Helped me a lot around here!

0

I appreciate the above comments to help me solve the problem, but I have identified here what is happening and now there is another error that I need to solve.

The data is like this in fact:

import pandas as pd 
import numpy as np 
import matplotlib as plt

df = pd.read_csv('dito_julho.csv')
df.head()

             campanha                           valor
1            Prospect | 5 dias | Com crédito       2
2            Prospect | 5 Dias                     5
3            Prospect                              2

What I wanted to do is create a new column according to each variable of the row divided by "|"

What I’ve done so far is, separate the lines you have "|"

Then I made the rule of separating and taking the dice:

df['Ação'] = df['Segmento'].apply(lambda x: x.split("|")[0])
df['Prazo'] = df['Segmento'].apply(lambda x: x.split("|")[1])
df['Credito'] = df['Segmento'].apply(lambda x: x.split("|")[2])

Ai gave the indexing error, because it has line with 2 fields and has line with 3, I wanted to know how I can create a function to identify if I have 3 take of 3 and if I have 2 take of the two only, skipping the indexing error.

Someone to help who’s starting in python here haha

Thank you very much!!

  • @Begnini is what you said, the data is the way I showed above, if you have a light to help I really appreciate it! Thanks!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.