Indexing error when separating line information

Question

Indexing error when separating line information

Asked 7 years ago

Viewed 159 times

0

import pandas as pd 
import numpy as np 
import matplotlib as plt

df = pd.read_csv('dito_julho.csv')
df.head()

             campanha                           valor
1            Prospect | 5 dias | Com crédito       2
2            Prospect | 5 dias | Com crédito       5
3            Prospect | 5 dias | Com crédito       7

So I try to create a new column with the second information of each row of column 1, ie I want to get the "5 Days"

df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])

However, it gives the error below:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-180-57ecc844181a> in <module>()
----> 1 df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])

c:\users\iuri\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-180-57ecc844181a> in <lambda>(x)
----> 1 df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])

IndexError: list index out of range

If I try to do with the first field, which is the Prospect, it works:

df_teste = df['Segmento'].apply(lambda x: x.split("|")[1])
df_teste.head()
>>>>>>>
0    Prospect 
1    Prospect 
2    Prospect 
3    Prospect 
4    Prospect

Does anyone have any hint as to why I can’t get this information?

If I do a test, creating something like this:

df_teste = df['Segmento'].apply(lambda x: x.split("|"))
df_teste.head()

>>>>>

0     [Prospect ,  5 dias ,  Com crédito]
1    [Prospect ,  20 dias ,  Com crédito]
2    [Prospect ,  40 dias ,  Com crédito]
3    [Prospect ,  75 dias ,  Com crédito]
4     [Prospect ,  5 dias ,  Sem crédito]

It is clear that could take the information, 1, the days, but this does not occur.

Could someone help me?

Are you trying to use the right column? In your example the column calls "campaign". And this apply works properly.

– Lorran Sutter

2018/08/10 at 20:39
In both examples you put [1], in the second would not be 0?

– Woss

2018/08/10 at 20:44
Are you sure all the lines have |? Just one of your csv won’t have to break the code.

– Begnini

2018/08/11 at 01:24
You killed the charade @Begnini some lines don’t have that, I managed to solve by listing only those that have the "|"

– Iuri Moura

2018/08/13 at 13:22

2 answers

1

One solution is to use:

def cria_colunas(string_campanha):
    lista = string_campanha.split("|")
    if len(lista) == 0:
        return '', '', ''
    elif len(lista) == 1:
        return lista[0], '', ''
    elif len(lista) == 2:
        return lista[0], lista[1], ''
    elif len(lista) == 3:
        return lista[0], lista[1], lista[2]

df['Ação'], df['Prazo'], df['Crédito'] = df['campanha'].apply(cria_colunas)

Or else:

def cria_acao(string_campanha):
    try:
        return string_campanha.split("|")[0]
    except:
        return ''

def cria_prazo(string_campanha):
    try:
        return string_campanha.split("|")[1]
    except:
        return ''

def cria_credito(string_campanha):
    try:
        return string_campanha.split("|")[2]
    except:
        return ''

df['Ação'] = df['campanha'].apply(cria_acao)
df['Prazo'] = df['campanha'].apply(cria_prazo)
df['Crédito'] = df['campanha'].apply(cria_credito)

That solves the problem, but I don’t think it’s the best way.

Thank you very much my friend! Helped me a lot around here!

– Iuri Moura

2018/08/13 at 19:21

Browser other questions tagged python pandas numpy

You are not signed in. Login or sign up in order to post.

by Iuri Moura • 61 points · Answer 1 · 2018-08-13T13:29:05+00:00

I appreciate the above comments to help me solve the problem, but I have identified here what is happening and now there is another error that I need to solve.

The data is like this in fact:

import pandas as pd 
import numpy as np 
import matplotlib as plt

df = pd.read_csv('dito_julho.csv')
df.head()

             campanha                           valor
1            Prospect | 5 dias | Com crédito       2
2            Prospect | 5 Dias                     5
3            Prospect                              2

What I wanted to do is create a new column according to each variable of the row divided by "|"

What I’ve done so far is, separate the lines you have "|"

Then I made the rule of separating and taking the dice:

df['Ação'] = df['Segmento'].apply(lambda x: x.split("|")[0])
df['Prazo'] = df['Segmento'].apply(lambda x: x.split("|")[1])
df['Credito'] = df['Segmento'].apply(lambda x: x.split("|")[2])

Ai gave the indexing error, because it has line with 2 fields and has line with 3, I wanted to know how I can create a function to identify if I have 3 take of 3 and if I have 2 take of the two only, skipping the indexing error.

Someone to help who’s starting in python here haha

Thank you very much!!