Function to duplicate a Python list

Asked

Viewed 184 times

1

Guys I have this list:

lista = [['10616558', 0],
 ['2856466', 1],
 ['9715350', 2],
 ['9715350', 3],
 ['9715350', 4],
 ['10720706', 5]]

The first element is any string, and the second is an index. I need to do a function that takes elements from the list that contains the same string, preserving the index.

The output would that way:

>>> lista = removeigual(lista)
>>> lista
[['10616558', 0],
 ['2856466', 1],
 ['9715350', 2],
 ['10720706', 5]]

I have a function that removes duplicates but is only for simple lists, but I could not adapt to my problem:

def removeDuplicates(listofElements):

    uniqueList = []

    for elem in listofElements:
        if elem not in uniqueList:
            uniqueList.append(elem)
    return uniqueList
  • The index of the first occurrence of the value must be preserved?

  • That question is answered here.

  • Hi Anderson. Yes I need to preserve the indexes!

  • @Tryagain, the cited problem function works only for simple lists. When I play my list. He returns the same thing to me because of the ratings. It sees the input and sees that it is a different value, and does not remove anything, even though the string has the same value. :/

3 answers

1


Your code is almost adapted to your problem, only a few minor changes have been made to the line if elem not in uniqueList:.

The entire initial list is traversed, and when a value is found with a string that has already been used, it is not added to the final list:

def removeDuplicates(listofElements):
  uniqueList = []
  for elem in listofElements:
    if elem[0] not in [i[0] for i in uniqueList]:  # se string ainda não estiver na uniqueList
      uniqueList.append(elem)
  return uniqueList

lista = [['10616558', 0],
         ['2856466', 1],
         ['9715350', 2],
         ['9715350', 3],
         ['9715350', 4],
         ['10720706', 5]]

print(removeDuplicates(lista))

Output:

[['10616558', 0], ['2856466', 1], ['9715350', 2], ['10720706', 5]]
  • 1

    Thanks t3m2! Your solution worked also valeeeeu ! :D

0

The following solution works if the index is a fixed value and not a number from some function being executed within the list.

lista = [['banana',0], ['caju',1], ['banana',2]]

for fruta in lista:
    for checkduplicada in lista:
        if fruta[0] == checkduplicada[0] and fruta[1] != checkduplicada[1]:
            lista.remove(checkduplicada)

print(lista)

output = [['banana', 0], ['caju', 1]]

See if it is possible to apply to your case.

  • Hi Felipe. Thanks for the answer. I got the solution but yours also helped me to understand the answer :D

0

If you can (and I don’t see why you couldn’t) use generators, you can instead store the entire list with string and index, to control which have already been returned, store only the string which is your reference. The logic basically is that for each item in your entry list, you check whether the string has already been returned and, if not, returns the array with string and index; if it has already been returned just ignore.

def remove_duplicates(sequences):
    returned = []
    for sequence in sequences:
        if sequence[0] not in returned:
            yield sequence
            returned.append(sequence[0])

So just do:

nao_repetidos = list(remove_duplicates(lista))

As the return of the function is a generator, just use list() to get as a list.

Some optimizations can be done, such as doing returned, which is a list, to be a set (set), improving the search of the elements; and if the entry list is already classified as to the string, you do not need to keep in memory all the strings returned, just check if the current within the repeat loop is equal to the last string returned by function.

A solution without generators could be, assuming a classified input list as to the string:

def remove_duplicates(sequences):
    last_string = None
    result = []
    for sequence in sequences:
        if sequence[0] != last_string:
            result.append(sequence)
            last_string = sequence[0]
    return result

Which is a particularly interesting solution for being O(n), going through the list only once and not relying on searches on intermediate lists.

  • Thanks for the reply Anderson. I need the index pq I will fetch the Dice in another list of same size and ordering from another source. But I couldn’t use that deduplication on the other list. So I came up with this idea of deduplicating my q list could be deduplicated by assigning an index, and after deduplication I take this index and pull the other list’s index. Your second code worked perfectly on my sample. Thank you very much :D

Browser other questions tagged

You are not signed in. Login or sign up in order to post.