0
I have an html file with urls in this URL pattern: https://www.olympikus.com.br/tenis-olympikus-flower-415-feminino-cinza-D22-1131-010
The standard is protocolo://dominio/strig-dinâmica-000-0000-000
I want to get all the links in this pattern. So I created the following ER: (https\:\/\/?)www\.olympikus\.com\.br\/(.*)\-[A-Z0-9]{3}-[A-Z0-9]{4}-[A-Z0-9]{3}
Unfortunately the pattern takes the initial Techo protocolo://dominio/
and ends in the last possible marriage -000-0000-000
Returning a raw string in the middle because of (.*)
. I cannot handle the dynamic part of the URL
How to write this ER so that it returns all links?
I am currently using egrep in the terminal, but examples with javascript are accepted because I intend to create a Crawler in this language in Nodejs.
Yes. Give it to me anyway.
– ayelsew