1
I have a web crawler that picks up links from websites,.
I wanted a regular expression that matches the found links. And only links like these would pass.
1
I have a web crawler that picks up links from websites,.
I wanted a regular expression that matches the found links. And only links like these would pass.
0
There are some means of doing that which you mentioned in the question, you can only capture the links that do not have the characters you mentioned (!#
) or ignore the link if the character you want is captured.
As I do not know the operation of your application I will leave the 2 ways here and as you did not mention the Flavour of Regex nor the language it will be used I will assume that it is something like the Flavour pcre
(php).
If you want to identify the strings that POSSESS #
or !
use this pattern:
(?=.*#|.*!)(.*)
To get the result you want, you must identify all the matchs
of that expression and disregard them, here’s a test I did to better visualize the result.
If you want to identify the strings that DO NOT POSSESS the standard:
((?!=.*#|.*!)(.*))
In that case you should consider only the matchs
and disregard the chains that were not captured, here another test but with that expression.
Explanation:
The two regex function similarly, but in one, a Positive Lookahead (?=
) and in the other the Negative Lookahead (?!=
).
Lookahead is a token that performs a character string analysis and only returns if there is a specific pattern.
After that there is the string that will determine the condition for Lookahead (.*#|.*!
), it indicates that there will be a string that can be numbers letters or symbols and after that #
or !
.
In the end there is (.*
) that will capture all characters (if Positive Lookahead) or not (if Negative Lookahead).
Browser other questions tagged regex
You are not signed in. Login or sign up in order to post.
I don’t think that’s possible. Do you have any pattern about the links? Which ones you wouldn’t like to see?
– Paulo Martins
Would you like urls containing ? # and other characters not to be captured.
– Jeferson Mota
Jeferson, people are here to help you solve a problem, not to do something for free for you. Then put the code you are writing and take your doubts about it.
– RFL