Limiting the number of regex Matches with Python

Asked

Viewed 73 times

7

I’m having a little trouble, I’d like to create a for in the Python to return a specific amount of match of regex.

The way I did, he’s returning all the links that exist and that meet the defined pattern of the page, however, I would like to capture only the amount I need, regardless if the page has 10, 20 or 30 links, only run in the amount that is set, can be 1, 2, n - a variable to be set for example numbers_articles.

Follows code:

# padrão regex para encontrar links de artigo
pattern = re.compile(r'https?:\/\/meiobit\.com\/[\d]+\/[A-Za-z0-9-]+\/')

# lista para armazenar os links capturados
article_links = []

# mexer neste for para retornar uma quantidade específica informada
for match in re.finditer(pattern, page_content):
    article_links.append(match.group(0))

Could someone help me?

1 answer

8


Whenever working with iterators, remember the package itertools.

You can use the function itertools.islice to limit the iterator returned by re.finditer. For example:

from itertools import islice

# padrão regex para encontrar links de artigo
pattern = re.compile(r'https?:\/\/meiobit\.com\/[\d]+\/[A-Za-z0-9-]+\/')

# lista para armazenar os links capturados
article_links = []
number_articles = 5
articles = islice(re.finditer(pattern, page_content), number_articles)

# mexer neste for para retornar uma quantidade específica informada
for match in articles:
    article_links.append(match.group(0))

The output will have a maximum of 5 records.

  • Thank you for your attention and explanation on a @Anderson solution. I will make the changes in the code as this response worked as expected.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.