As the question has not yet been perfectly clear - and apparently the author himself has not been able to explain - I will consider this answer strings independent of the format with one condition well-specified: where there is an integer value in the string, the classification should consider the numerical value of these characters and not more as text; this would imply, for example, that the string c2sp1s5
should appear before the string c10sp1s5
, due to the presence of numerical values 2 and 10 in string and that usually 2 is less than 10.
For the implementation of this logic, I will create a function called magic
, which, as the name suggests, will do magic with the classification. The function will receive a string to then separate it to each numerical value found, generating a list of strings, some with text only, others with numerical values; for example, with input c2sp1s5
will generate the list ['c', '2', 'sp', '1', 's', '5']
, already the entrance c2sp1s10
will generate the list ['c', '2', 'sp', '1', 's', '11']
. If we compare the two generated lists, we would have the same initial problem: each term of the lists would be compared one by one and the result would be exactly the same, because still '11
' would be less than '2'
, then, before comparing the list, we should convert the numerical values to integers, resulting in the lists ['c', 2, 'sp', 1, 's', 5]
and ['c', 2, 'sp', 1, 's', 11]
; thus, when comparing the lists, in the latter term would be compared the integer values 2 and 11, returning 2 as less than 11.
The code would look like this:
def magic(value):
parts = re.split(r'(\d+)', value)
return [int(part) if part.isdigit() else part for part in parts]
The first line of the function divides the input into numerical values and the second returns a list by converting the numerical values into integers. As the function waits only one string, to use in the example given in the question, it is necessary to inform which will be the string which will be considered in the ranking of the list. In this case, it is the string present at index 0, so we do:
import re
def magic(value):
parts = re.split(r'(\d+)', value)
return [int(part) if part.isdigit() else part for part in parts]
a = [
['c2sp1s5', 0],
['c2sp1s10', 1],
['c2sp1s11', 0],
['c2sp1s1', 0]
]
print( sorted(a, key=lambda v: magic(v[0])) )
See working on Ideone | Repl.it
What generates the result>
[
['c2sp1s1', 0],
['c2sp1s5', 0],
['c2sp1s10', 1],
['c2sp1s11', 0]
]
Ricardo, you didn’t pay attention to the excerpt I commented on all the rules for ordering. Until then the result is, yes, expected, unless you explain in detail why it is not. The strings at all times will begin with
'c2sp1s'
? If yes, should the sort occur only considering the last characters? If not, what are the possible values? Why not set the list with values using left zeros, such as'c2sp1s001'
?– Woss
@Andersoncarloswoss I tried to detail as much as possible why the ordering was not desired
– Ricardo Mendes