Is there a method in Python similar to x.find(string), but which returns all occurrences?

Question

Is there a method in Python similar to x.find(string), but which returns all occurrences?

Asked 5 years, 10 months ago

Viewed 399 times

2

For example, I have a string with several occurrences of a:

a = 'abcdefhanmakaoa'

If I use the method find, I can only find the index of the first occurrence. There is some native method/function that returns all occurrences?

No, you have to write something that finds it all for you.

– Maniero

2018/12/02 at 12:50
I get it, thank you

– Kael Souza

2018/12/02 at 12:58
2

in fact, one could argue what the re.finditer does this - but it is so flexible that the little boilerplatezinho that you have to put around it to have the indexes can be interpreted by "you have to do". The @hkotsubo response has an example of using finditer.

– jsbueno

2018/12/03 at 12:39

2 answers

6

You have two ways to do this. One using regex, to create an iterator with all occurrences. The other is using brute force in a function, which in case would be something like:

inicio = 0
termo = "a"
indexes = []

while inicio < len(a):
    resultado = a[inicio:].find(termo)
    if resultado == -1: 
        break
    indexes.append(resultado + inicio)
    inicio += resultado + 1

So the list indexes will store all occurrences.

You can even save this in a function called findAll, to use whenever you need to do this operation.

Using arguments from the find method()

Just as @jsbueno mentioned, you can add the start parameter directly in the find() method, which would make the code look like this:

while inicio < len(a):
        resultado = a.find(termo, inicio)
        if resultado == -1: 
            break
        indexes.append(resultado)
        inicio = resultado + 1
    return indexes

Computational efficiency of options

Creating 2 functions for each option above, and measuring with the timeit module:

print(timeit.timeit(fun1, number = 1000000))
print(timeit.timeit(fun2, number = 1000000))

It is possible to notice that the second method, using the find() function directly, instead of using the Slice for strings, is more efficient. prints:

2.979384636011673
2.4164228980080225

1

Perfect, thank you :)

– Kael Souza

2018/12/02 at 18:44
4

nice algorithm! It’s straight and well readable - and I even did so I think until last year - from there dscobri that the method itself .find accepts a second argument, with the "start" offset of the search. This facilitates very precisely these cases, avoids the relatively expensive operation of slicing a string to the next find and the possibility of error when calculating the displacements to the next loops. It might be nice for you to complete the answer with a second example using this way of passing the value of resultado + 1 directly in the second parameter of "find".

– jsbueno

2018/12/03 at 12:32
@jsbueno, I updated the code with your suggestion, and actually, as you said, the processing is more efficient

– Luan Naufal

2018/12/03 at 19:30
1

@Kaelsouza, if my answer has answered your question, please mark as the correct answer :)

– Luan Naufal

2018/12/03 at 19:31

Browser other questions tagged python string

You are not signed in. Login or sign up in order to post.

by hkotsubo • **55,826** points · Answer 1 · 2018-12-03T00:19:23+00:00

For this specific case, where you only want to search for the occurrence of a single letter, you can go through the string with enumerate, so you can have at once the index and its character in that index.

And since you want to create a list with the result, you can use the syntax of comprehensilist on:

a = 'abcdefhanmakaoa'
# "i" é o índice, "c" é o caractere naquele índice
indices = [i for i, c in enumerate(a) if c == 'a']
print(indices)

The line that creates the index list (indices = [ ...) is a comprehensilist on, and is equivalent to making a for "traditional" of other languages:

indices = []
for i, c in enumerate(a):
    if c == 'a':
        indices.append(i)

Although they are equivalent, the comprehensilist on, besides more succinctly, is the more shape pythonic to make.

The output is a list of all indexes that correspond to a letter "a":

[0, 7, 10, 12, 14]

The above code works only for cases where you want to search for occurrences of a single letter.

For more complicated/general cases, where you want to search for occurrences of a word, for example (or more complicated criteria, such as "starts with a lowercase letter or number and has at least N characters, etc"), an alternative is to use regex, through the module re:

import re

a = 'abcdefhanmakaoa'
indices = [m.start(0) for m in re.finditer('a', a)]
print(indices)

finditer returns all the pouch obtained (also using the syntax of comprehensilist on return is already in a list). How the regex used is 'a' (the letter "a"), will be returned a list with all the pouch which correspond to this letter. Next, start(0) returns the initial position of match. With this, you get the indexes of all "a" letters of the string.

The index list is the same as the one obtained in the first example. But as the expression to be tested is very simple (the letter a), in this specific case I find it an exaggeration to use regex. But the alternative is recorded, in case you need more complicated cases than picking up only a specific letter.

Another bonus to use regex in the most complicated cases is that you can also get the final index:

a = 'abcdeafabc'
indices = [(m.start(0), m.end(0)) for m in re.finditer('abc', a)]
print(indices)

In case, I’m looking for occurrences of abc and returning a list of tuples (note the parentheses around m.start(0) and m.end(0), they delimit a tuple), each tuple contains the initial and final index of abc:

[(0, 3), (7, 10)]

Note that indexes follow the rule start inclusive/end exclusive ("start included, end not included"). For example: the first occurrence of abc corresponds to the indexes 0, 1 and 2 of the string, but the return was (0, 3) (the final index is not "included").

Of course, if you only want the initial index, just use m.start(0), as explained above.