Group by week

Asked

Viewed 1,093 times

2

I have a table like this I’m trying to separate by week, I don’t have much knowledge in which functions I can use to check how many weeks you have between these dates, to then group. The data is in a csv table, I am using pandas to open the csv file.

         Nome completo  count
Hora                                                      

2017-03-06  Talita       4
2017-03-07  Filho        8
2017-03-07  Jao         10
2017-03-07  Otavio       6
2017-03-08  Gabriel      2
2017-03-13  Guilherme    1

Final table I plan to assemble

Nome         Semana_1  Semana_2   
Talita          4         0
Filho           8         0
Jao             10        0
Otavio          6         0
Gabriel         2         0
Guilherme       0         1
  • What would group by weeks? Why is it necessary to check how many weeks there are between the dates? Where does this table come from?

  • Group when Count the name had in the week of that date.. I had thought to check how many weeks have between the day 06 the day 08 example for a longer date would have 7 weeks with this I will check this date 2017-03-06 belongs in the first week so I will put in the first week Alita had 4 Count.. On the table I set up to learn how to group this data per week. Sorry I don’t have any information

  • It might be me, and if it is, I’m sorry, but I just couldn’t understand what you said. "Group when Count name had in the week of that date", that sentence didn’t seem to make sense; "check how many weeks have between the day 06 the day 08 example for a longer date would have 7 weeks", what? Between 06 and 08 only 1 day ago and what would be this longer date? Is it possible to put a [mcve]? Mostly with dates that are weeks apart, not just next dates.

  • I’m sorry, but I put an example of the end I’m trying to ride.

  • And how are these data structured in Python? Ask this in the question too.

  • Matheus, you need to collaborate by giving the necessary information. Saying that the data is in a CSV is useless without the data itself. Put exactly how the data is and already put the Python code you are using to read this data. Be objective and clear in the question. Information is trivial to solve the problem, do not omit it.

  • there is not much code, there is only one line indicating that I am opening the file csv log_df = pd.read_csv('logs/logs.csv',delimiter=","), the data is exactly like this

Show 2 more comments

2 answers

1

Matheus, I’ve assembled a code where I create a dictionary with the data that Voce passed there, and create 2 methods: one to add the exact week of the year on each object of the user list, another to organize the output table, to be as close as possible to what you want to mount there in your example.

To identify the week, I used the attribute .isocalendar(), that returns a tuple containing the calendar of the week.

import datetime

user_list = [
    {
        "name": "Talita",
        "count": 4,
        "date": datetime.date(year=2017, month=3, day=6),
    },
    {
        "name": "Filho",
        "count": 8,
        "date": datetime.date(year=2017, month=3, day=7),
    },
    {
        "name": "Jao",
        "count": 10,
        "date": datetime.date(year=2017, month=3, day=7),
    },
    {
        "name": "Otavio",
        "count": 6,
        "date": datetime.date(year=2017, month=3, day=7),
    },
    {
        "name": "Gabriel",
        "count": 2,
        "date": datetime.date(year=2017, month=3, day=8),
    },
    {
        "name": "Guilherme",
        "count": 1,
        "date": datetime.date(year=2017, month=3, day=13),
    },
]

def get_week(user_list):
    for u in user_list:
        week = u['date'].isocalendar()[1]
        u['week'] = week
    return user_list

def group_by_week(user_list):
    actual_week = 0
    week_group = {}
    for u in user_list:
        if actual_week != u['week']:
            week_group[u['week']] = []
            week_group[u['week']].append(u)
            actual_week = u['week']
        else:
            week_group[actual_week].append(u)
    return week_group


# Adicionando semana em cada objeto
print("~> Adicionando semana em cada objeto e printando")
user_list = get_week(user_list)
for t in user_list:
    print("Name: %s, week: %s" % (t['name'], t['week']))

print("\n~> Organizando tabela de saida")
# Organizando tabela
week_group = group_by_week(user_list)
count_week = 0
for key in week_group:
    if count_week < key:
        count_week += 1
    print("# Semana %s" % count_week)
    for user in week_group[key]:
        print("Name: %s, count: %s, date: %s" % (user['name'], user['count'], user['date']))

Upshot

~> Adicionando semana em cada objeto e printando
Name: Talita, week: 10
Name: Filho, week: 10
Name: Jao, week: 10
Name: Otavio, week: 10
Name: Gabriel, week: 10
Name: Guilherme, week: 11

~> Organizando tabela de saida
# Semana 1
Name: Talita, count: 4, date: 2017-03-06
Name: Filho, count: 8, date: 2017-03-07
Name: Jao, count: 10, date: 2017-03-07
Name: Otavio, count: 6, date: 2017-03-07
Name: Gabriel, count: 2, date: 2017-03-08
# Semana 2
Name: Guilherme, count: 1, date: 2017-03-13
  • Thank you very much, your reply helped me a lot, but I came up with a doubt if I have in the same week the same user and only want to group Count in the name after all is the same person?

  • Ai Voce will have to do one more function to unify duplicities. You can use reduce, as in the example here: https://stackoverflow.com/questions/29078694/returning-sum-of-duplicate-elements-in-javascript-array

1


Since you’re using pandas to read the csv, I suggest you use your own functions to resolve this issue. Surely there must be several ways to do this with the pandas, I will present to you the one that occurred to me.

Edited
See that I repeat Talita, in the same week, to show and effectiveness of the solution. :-)

Simulating the csv:

import io 
import pandas as pd

s = '''
"Data","Nome","Count"
2017-03-06,"Talita",4
2017-03-07,"Filho",8
2017-03-07,"Jao",10
2017-03-07,"Talita",5
2017-03-07,"Otavio",6
2017-03-08,"Gabriel",2
2017-03-13,"Guilherme",1
'''

Reading the csv:

df = pd.read_csv(io.StringIO(s), parse_dates=True)

Grouping:

df['Data']=pd.to_datetime(df['Data'])
df['Semana'] = df['Data'].dt.to_period('W-THU')
df = df.groupby(by=['Semana', 'Nome'])['Nome', 'Count', 'Data'].sum()

Final result:

                                 Count
Semana                Nome            
2017-03-03/2017-03-09 Filho          8
                      Gabriel        2
                      Jao           10
                      Otavio         6
                      Talita         9
2017-03-10/2017-03-16 Guilherme      1

From this table you can make a iterrows assemble in any format you wish, and you can export to various formats with the .to, as an example: df.to_csv, df.to_json, to_latex, to_pickle, to_records, to_string, to_xarray and "so on...."

  • Thank you very much I found very good even your solution, helped me a lot.

  • Opa, good to know that helped, consider giving the acceptance in the question (sign next to the voting arrow) :-)

  • only one last doubt for me to assemble the table I wish with iterrows, could give a help in logic?

  • Take a look in that reply, who has this logic, takes advantage and gives a vote. :-)

  • You can also convert to a dictionary with df.to_dict() and ride from it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.