How to group numeric sequences?

Asked

Viewed 242 times

1

Friends, I have a csv file with 5k lines in which there are purchase transactions. There is an id for each purchase, it occurs that for a lot where several purchases transactions are made there is an id that always starts with the same numerical sequence, but before the end of this sequence there is a number that identifies a purchase within the batch report. Example:

Person bought 5 items:

000034200100 000034200200 000034200300 000034200400 000034200500

If these sequences lived in this order would be wonderful, it turns out that these sequences come scattered in this 5k-line file.

How can I group these lots together in order to put everything together as in my example? I want to do this in python okay.

I thought about clusters but I don’t know if it’s a good idea.

  • 1

    Do you already have something of this code done? Can you include an example of the complete data that needs to be organized? The result would be another .csv?

  • I don’t have anything in code ready yet. I received this yesterday from a customer and basically is a csv with that column of id, description, origin, destination, value, fare, and etc. The output can be a csv or an xlsx.

  • What is csv’s astronomer? the way you asked the question makes it difficult to understand the context, I suggest editing the question and putting a fragment of csv to 'clear' a little more, it’s kind of obscure. :-)

  • csv is separated by commas and opening in excel the first column is id, soon after comes description, quantity, origin, destination, value, tariff.... It is a common columnar structure, nothing very different from conventional tables.

  • The file has many columns and to put here a piece will be bad, but basically it is a normal table without any kind of different structure.

  • With its description we cannot understand how these sequences are "scattered". If you want to post csv (or part of it) on a Storage any (google drive, Expirebox or File Town and put the link here that I try to help. I’m here having fun with python. :-)

  • Okay. I will provide, but when I said spread, I meant that in the column id these numbers ( as I described) does not come in sequence, IE, purchases of a lot does not come with the ids in sequence, but comes out of order. If the guy buys 5 items, the ids don’t come in ordered sequence...?

  • No, not to understand so, I need to know the "structure" of csv, so it gets very obscure. If in msg you colcasse a 2 or 3 lines of csv with an example and explanation, probably I would understand.

Show 3 more comments

1 answer

1


If the number you identify is always the last 3 (or a fixed number of digits from the end)

l = ["000034200100", "000034200200", "000034200300", "000034200400", "000034200500"]
d = {} #Vamos salvar os resultados aqui
for numero in l:
    prefixo = numero[:-3] #Slice sem os 3 últimos elementos da string
    if not d.get(prefixo, False):
        d[prefixo] = [numero]
    else:
        d[prefixo].append(numero)

This will generate a Dict separated by "prefix" ie the ids without the lot numbers, where the value of each prefix is a list with all the Ids that start with that prefix, vc can give a sort on that list to order her

Browser other questions tagged

You are not signed in. Login or sign up in order to post.