How to sort a string by initial fragment?

Question

How to sort a string by initial fragment?

Asked 6 years, 2 months ago

Viewed 81 times

3

I need to order a string originating from the git tag -ln.

2019.3.0        Primeira versão totamente integrada com gitlab-ci
2019.3.0-dev0   Aplicado correções sugeridas por py3kwarn
2019.3.0-dev1   refatorado o metodo test_merge_csv_files0 para operar gitlab-ci com validação de csv
2019.3.0-dev2   import faker corrigido
2019.3.0-dev3   FAIL: test_discover_url_0 (TestAtosWebSF) corrigida
2019.3.0-dev4   corrigido ERROR: Failure: Error (unsupported locale setting)
2019.3.0-dev5   corrigido AssertionError em test_discover_url_0
2019.3.1        2019.3.1-dev0 → 2019.3.1
2019.3.1-dev0   fatoração de metodo a ser implementado futuramente
2019.3.10       Correções em epigrafe com mesmo numero
2019.3.11       Correções em º/°
2019.3.12       Correções em 'Nº 1.420 A'
2019.3.13       Corrigido diversos erros de falso positivo na localização da epigrafe e no capabilities para windows
2019.3.14       ferramentas e relatórios para reconstrução do acervo
2019.3.2        Validação gitlab-ci completa

Desired:

2019.3.13       Corrigido diversos erros de falso positivo na localização da epigrafe e no capabilities para windows
2019.3.12       Correções em 'Nº 1.420 A'
2019.3.11       Correções em º/°
2019.3.10       Correções em epigrafe com mesmo numero
2019.3.2        Validação gitlab-ci completa
2019.3.1        2019.3.1-dev0 → 2019.3.1
2019.3.1-dev0   fatoração de metodo a ser implementado futuramente
2019.3.0        Primeira versão totamente integrada com gitlab-ci
2019.3.0-dev5   corrigido AssertionError em test_discover_url_0
2019.3.0-dev4   corrigido ERROR: Failure: Error (unsupported locale setting)
2019.3.0-dev3   FAIL: test_discover_url_0 (TestAtosWebSF) corrigida
2019.3.0-dev2   import faker corrigido
2019.3.0-dev1   refatorado o metodo test_merge_csv_files0 para operar gitlab-ci com validação de csv
2019.3.0-dev0   Aplicado correções sugeridas por py3kwarn

Or acceptable:

2019.3.14       ferramentas e relatórios para reconstrução do acervo
2019.3.13       Corrigido diversos erros de falso positivo na localização da epigrafe e no capabilities para windows
2019.3.12       Correções em 'Nº 1.420 A'
2019.3.11       Correções em º/°
2019.3.10       Correções em epigrafe com mesmo numero
2019.3.2        Validação gitlab-ci completa
2019.3.1-dev0   fatoração de metodo a ser implementado futuramente
2019.3.1        2019.3.1-dev0 → 2019.3.1
2019.3.0-dev5   corrigido AssertionError em test_discover_url_0
2019.3.0-dev4   corrigido ERROR: Failure: Error (unsupported locale setting)
2019.3.0-dev3   FAIL: test_discover_url_0 (TestAtosWebSF) corrigida
2019.3.0-dev2   import faker corrigido
2019.3.0-dev1   refatorado o metodo test_merge_csv_files0 para operar gitlab-ci com validação de csv
2019.3.0-dev0   Aplicado correções sugeridas por py3kwarn
2019.3.0        Primeira versão totamente integrada com gitlab-ci

With this code I got the following result.

for i in sorted(lista.split(sep='\n'), reverse=True):
    print(i)

Obtained:

2019.3.2        Validação gitlab-ci completa
2019.3.13       Corrigido diversos erros de falso positivo na localização da epigrafe e no capabilities para windows
2019.3.12       Correções em 'Nº 1.420 A'
2019.3.11       Correções em º/°
2019.3.10       Correções em epigrafe com mesmo numero
2019.3.1-dev0   fatoração de metodo a ser implementado futuramente
2019.3.1        2019.3.1-dev0 → 2019.3.1
2019.3.0-dev5   corrigido AssertionError em test_discover_url_0
2019.3.0-dev4   corrigido ERROR: Failure: Error (unsupported locale setting)
2019.3.0-dev3   FAIL: test_discover_url_0 (TestAtosWebSF) corrigida
2019.3.0-dev2   import faker corrigido
2019.3.0-dev1   refatorado o metodo test_merge_csv_files0 para operar gitlab-ci com validação de csv
2019.3.0-dev0   Aplicado correções sugeridas por py3kwarn
2019.3.0        Primeira versão totamente integrada com gitlab-ci

I also tried to index the key as integer, as below, but in this case it is not acceptable and generates an exception.

for i in sorted(lista.split(sep='\n'), reverse=True, key=int):
    print(i)

How can I solve this problem?

2 answers

2

Maybe this code can help you. = D

In it I used the parameter key of function sorted to pass a function that returns the value that will be used for sorting.

In this function I will use regular expressions, to separate the data-review(2019.3.0-dev0) and then pick up separately ano,mes,dia and review number, after that correct the dia/mês for 2 digits and add a review number high if you don’t have.

 2019.3.0-dev4  =>  20190300.4
 2019.3.0-dev5  =>  20190300.5
 2019.3.1       =>  20190301.9999
 2019.3.1-dev0  =>  20190301.0

import re

lista = """2019.3.0        Primeira versão totamente integrada com gitlab-ci
2019.3.0-dev0   Aplicado correções sugeridas por py3kwarn
2019.3.0-dev1   refatorado o metodo test_merge_csv_files0 para operar gitlab-ci com validação de csv
2019.3.0-dev2   import faker corrigido
2019.3.0-dev3   FAIL: test_discover_url_0 (TestAtosWebSF) corrigida
2019.3.0-dev4   corrigido ERROR: Failure: Error (unsupported locale setting)
2019.3.0-dev5   corrigido AssertionError em test_discover_url_0
2019.3.1        2019.3.1-dev0 → 2019.3.1
2019.3.1-dev0   fatoração de metodo a ser implementado futuramente
2019.3.10       Correções em epigrafe com mesmo numero
2019.3.11       Correções em º/°
2019.3.12       Correções em 'Nº 1.420 A'
2019.3.13       Corrigido diversos erros de falso positivo na localização da epigrafe e no capabilities para windows
2019.3.14       ferramentas e relatórios para reconstrução do acervo
2019.3.2        Validação gitlab-ci completa"""

pegar_data_extra = re.compile("\ {2,}")
pegar_ano_mes_dia = re.compile("(\d{4})\.(\d{1,2})\.(\d{1,2}).*")
pegar_extra = re.compile("-dev([^ ]+)")

def formatar_para_sort( x ):
    x = pegar_data_extra.split( x )[0]
    tmp = pegar_ano_mes_dia.search( x )
    extra = pegar_extra.search( x )
    ano = tmp.group(1)
    mes = tmp.group(2)
    dia = tmp.group(3)

    # pegar versão, se não tiver colocar uma alta 9999 
    extra = extra.group(1) if extra else "9999"

    # corrigir mes/dia para 2 digitos
    mes = "0"+mes if len(mes) < 2 else mes
    dia = "0"+dia if len(dia) < 2 else dia

    out = ano+mes+dia+"."+extra

    #print(x, " => ", out)

    return out

l = sorted(lista.split(sep='\n'), key=formatar_para_sort, reverse=True)

for i in l:
    print( i )

Example running on ideone

1

sensational your balcony!!! D

– britodfbr

2019/05/19 at 12:21

Browser other questions tagged python python-3.x string

You are not signed in. Login or sign up in order to post.

by Maniero • **444,682** points · Answer 1 · 2019-05-18T23:58:07+00:00

The data is bad and then you will have to do some treatment in it to make it work. Ideally the data would come in a more organized way. There are several ways to do this, and I don’t know what’s best without knowing what can be done, what can come, what is acceptable on all points, I’m going to give you a possible solution. If I had sent a code that showed what I tried to do, namely a [mcve], I would have put a test showing the result.

The problem is that one of the elements can have 1 or 2 numeric digits and then as it is a text 2 comes after 1 according to the column you are comparing. In other words, comparing "2" to "11" the smallest is "11", after all, the first column is 1 in it and 2 in the other. The ideal was the data to come "02", there would be no comparison problem. As it did not come you should put this 0.

If you can in any case have 3 digits already complicates more and you have to treat it, if the previous element that is there all 3 can have the same problem you have to treat also, the same for the dev-x.

Then create a function that normalizes the die the way the classification works and adopts it as a function key sorted(), something like this should work:

def normaliza(texto):
    return texto[:7] + (("0" + texto[7]) if (not texto[8].isnumeric()) else (texto[7:9])) + texto[9:]

I put in the Github for future reference.

I’m taking the beginning and the end of the normal text, the brain changes. If it finds a non-numeric datum where it should have the second numeric digit at that position then it should put a character 0 in front of that digit found there, otherwise it should take the two normal characters because they are already normalized.