0
I am trying to create an interpreter (parser) of robots.txt with Regex but I can’t make the expression right. I did several tests in Regex101 and still did not achieve an expected result.
My regular expression:
/user-agent: (bot|\*)\n*((disallow:\s*(?<disallow>.*)|allow:\s*(?<allow>.*)|sitemap:\s*(?<sitemap>.*))\n*)+/gi
My variable of tests:
User-agent: *
Disallow: /exemplo/
Allow: /dolor/
Disallow: /sit/
Allow: /amet/
Sitemap: http://www.loremipsum.com/sitemap.xml
In the image you can see the result that Regex101 returns and the one that I wanted to return.

Can you explain what exactly you want to do with regex? It’s easier than identifying the colors of your example.
– Molx
I want to put the values of
disallow,allowandsitemapwithin a namesake array. For example,/amet/would be inside the arrayallow.– hsbpedro
You might want to rethink how you are going to use this regex. I think it is not possible for a group with multiple results, or multiple groups with the same name. An easier alternative is to do it in stages. For example, you can only take the
Disallowusing(?<=disallow:)\s?(.*)and do the same forAllowand other elements of the robots.– Molx
@Only there’s a problem: the regular expression will catch all the
allowanddisallowfile. I wanted you to only get those rules that were inside theuser-agentright (how*orbot).– hsbpedro
It can be in Perl?
– JJoao
No need. I got it. Thank you very much!
– hsbpedro