How to get the index comparing two lists of different lengths in Python?

Asked

Viewed 441 times

0

I have two different lists in two CSV files.

The first list has 47843 items and the second 813331, which is the first multiplied by 17.

I want to take the index position in the first, based on the value of the second, because the second repeats the value and does not have the same order as this:

Fisrt
1:   ARS-BFGL-BAC-10919
2:   ARS-BFGL-BAC-10975
3:   ARS-BFGL-BAC-11000
4:   ARS-BFGL-BAC-11003
5:   ARS-BFGL-BAC-11025
6:   ARS-BFGL-BAC-11044
7:   ARS-BFGL-BAC-11193
8:   ARS-BFGL-BAC-11215
9:   ARS-BFGL-BAC-11218
10:  ARS-BFGL-BAC-11276

Second:

1:    ARS-BFGL-BAC-10919
2:    ARS-BFGL-BAC-11003
3:    ARS-BFGL-BAC-10975
4:    ARS-BFGL-BAC-11044
5:    ARS-BFGL-BAC-11000
6:    ARS-BFGL-BAC-10975
7:    ARS-BFGL-BAC-11025
8     ARS-BFGL-BAC-11193
9:    ARS-BFGL-BAC-11044
.
.
.
.
.
.

I want that result:

{1,4,2,6,3,2,5,7,6....}
  • 1

    You ever think about putting that in a database? Sqlite maybe. A 1M line csv and no indexing will make any code, or slow, or laborious.

  • I am filtering this to make the inclusion in the database. But to make the inclusion I need some filters.

1 answer

2


As you have not provided any code, I will invent here in a simple way:

lista1 = [
    'ARS-BFGL-BAC-10919',
    'ARS-BFGL-BAC-10975',
    'ARS-BFGL-BAC-11000',
    'ARS-BFGL-BAC-11003',
    'ARS-BFGL-BAC-11025',
    'ARS-BFGL-BAC-11044',
    'ARS-BFGL-BAC-11193',
    'ARS-BFGL-BAC-11215',
    'ARS-BFGL-BAC-11218',
    'ARS-BFGL-BAC-11276',
]
lista2 = [
    'ARS-BFGL-BAC-10919',
    'ARS-BFGL-BAC-11003',
    'ARS-BFGL-BAC-10975',
    'ARS-BFGL-BAC-11044',
    'ARS-BFGL-BAC-11000',
    'ARS-BFGL-BAC-10975',
    'ARS-BFGL-BAC-11025',
    'ARS-BFGL-BAC-11193',
    'ARS-BFGL-BAC-11044',
]

The simplest solution uses normal indexing, it can be slow because it will search each element of one list in the other:

>>> print(tuple(lista1.index(elem)+1 for elem in lista2))
(1, 4, 2, 6, 3, 2, 5, 7, 6)

Another solution is to convert list1 to dictionary - the dictionary search is almost instantaneous so it would be much faster:

>>> d1 = {item: index for index, item in enumerate(lista1, start=1)}
>>> print(tuple(d1.get(elem) for elem in lista2)
(1, 4, 2, 6, 3, 2, 5, 7, 6)
  • Problem solved :)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.