Dot-Star problem in Regex

Asked

Viewed 31 times

0

I am trying to create an interpreter (parser) of robots.txt with Regex but I can’t make the expression right. I did several tests in Regex101 and still did not achieve an expected result.

My regular expression:

/user-agent: (bot|\*)\n*((disallow:\s*(?<disallow>.*)|allow:\s*(?<allow>.*)|sitemap:\s*(?<sitemap>.*))\n*)+/gi

My variable of tests:

User-agent: *

Disallow: /exemplo/
Allow: /dolor/
Disallow: /sit/
Allow: /amet/

Sitemap: http://www.loremipsum.com/sitemap.xml

In the image you can see the result that Regex101 returns and the one that I wanted to return.

Como o código é reproduzido e como ele deverá ficar

  • Can you explain what exactly you want to do with regex? It’s easier than identifying the colors of your example.

  • I want to put the values of disallow,allow and sitemap within a namesake array. For example, /amet/ would be inside the array allow.

  • You might want to rethink how you are going to use this regex. I think it is not possible for a group with multiple results, or multiple groups with the same name. An easier alternative is to do it in stages. For example, you can only take the Disallow using (?<=disallow:)\s?(.*) and do the same for Allow and other elements of the robots.

  • @Only there’s a problem: the regular expression will catch all the allow and disallow file. I wanted you to only get those rules that were inside the user-agent right (how * or bot).

  • It can be in Perl?

  • No need. I got it. Thank you very much!

Show 1 more comment

1 answer

0


After a few days thinking about how to make one parser, I was able to create something like Regex.

Only the code isn’t 100% perfect and that’s why I’m still testing it.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.