How to capture textarea tag with new line?

Asked

Viewed 209 times

10

How to get the values of a <textarea> with Regex, including new line?

I have the following expression to get a textarea:

([<]textarea.*[<\/textarea][>])

Online example.

The problem is that if the textarea have line break, the expression cannot capture it.

Hence my doubt: How to capture the textarea with new line?

  • 2

    Treating HTML with regex is difficult and sometimes the wrong tool. What environment/language are you working on?

  • @Sergio I’m using C#, but my doubt is only with the expression itself. I don’t use Regex to manipulate html, I use the html Agility pack. But we talked about it in chat, just wanted to bring the question to the site too.

  • Randrade, out of curiosity, had some bug in my answer?

  • @Sergio, you I replied in chat, but answering here too: Sorry, I would tell you here in chat the reason I forgot most. Your answer is perfect, you have no problem with it. I changed the acceptance only because the answer of the Guilhermelautert has a more didactic explanation on its part. I thought this would help you to seek. But I’m waiting for the time limit to offer a reward.

  • Okay, fine. His answer is bigger because regex is more complex :) I didn’t know you wanted to separate tags and content. Good content stays here in question and answers, nice.

4 answers

11


The problem with this REGEX is that by default the . does not include the \n, this way would have to circumvent this lack, may be with denial [^...], that captures anything that is not in the group.

For your need you can do so: <(textarea)([^>]*)>([^%]*?)</\1>.

See working in REGEX101

Explanation

  • <(textarea) - capture literally < and generates a literal group with textarea, which will be used as a shortcut.
  • ([^>]*)> - will be all attributes of the tag, remembering that attributes do not have > so I used his denial to take everything, finally should end with the tag ending >.
  • ([^%]*?) - here is content to be captured, I used the denial of % 'cause I guess I won’t have it in the middle, but if I do, just switch to another character, for example ¬, remembering that because it is denial includes any and all character that is not in the group including the \n.
  • </\1> - finally it should capture the end of the tag. that was resumed with the group 1 shortcut \1.

Addendum

You can also use the flag s to allow the .(Dot) captures \n. by changing the REGEX to <(textarea)([^>]*)>(.*?)</\1>.

Remembering that the frag should be applied s.

Example JS

string.match(/<(textarea)([^>]*)>(.*?)<\/\1>/gs); // aqui  foi necessário escapar o `/`, para não ser interpretado como fim da REGEX `<\/\1>`.

See working in REGEX101

  • 1

    You can put an example of the regex working here https://regex101.com/r/fH1eL8/1 ?

  • I didn’t understand the downvote, Guilherme sent me the test link and it worked perfectly https://regex101.com/r/wC9oA3/8 (is that Guilherme used ~ instead of /)

  • @Good Guilhermelautert. I was just watching, but this way it doesn’t get everything since <textarea until closing the tag with /textarea> that’s right?

  • @Sergio, I don’t know if I got it right, but he captures everything until the tag closes, only he still separates it into groups, and the content is separated from the attributes.

  • Face deserves +100 for the explanation of s That’s a hand on the wheel, then I’ll throw a bounty on you :)

9

You can do it like this:

(<textarea[\s\S]+?textarea>)

Example: https://regex101.com/r/oQ1qJ3/1

The important part is [\s\S]+?, which basically allows all, once or more, and the ? says to be lazy and make the capture in the first opportunity that find.

  • I was trying here: http://regexr.com/3du5c the problem is that it was including everything from the opening of the first to the closing of the last. What part of your expression is stopping this from happening?

  • @Miguel joins a ? http://regexr.com/3du5f

  • 1

    Ha boa... Thanks for the clarification and the alternative

  • 1

    That one +? is what solves my problem with the [\s\S], now I believe it will be easier to port my regex to different interpreters, such as js, java, c#, preg. + 1

5

The other answers are the right ones and excellent ones, I just made some modifications:

  • I modified the example of @Sergio for:

    /(<textarea[\s\S]+?<\/textarea>)/g

    Testing: https://regex101.com/r/oQ1qJ3/2

    This to avoid things like <textarea>abc<textarea> (see that the bar is missing, but in @Sergio’s original regex was getting the match)

  • If you need to match attributes and content separately, do so:

    /(<textarea([^>]+)>([\s\S]+?|)<\/textarea>)/g


  • I modified the example of @Guillhermelautert for:

    /<(textarea)([^>]+)>([^%]*?)<\/\1>/g

    Testing: https://regex101.com/r/wC9oA3/8

    The answer works perfectly, but if you need to use the / will not work well due to </\1>, of course the situation varies in different languages, this is only for a specific situation.


Note: I came to create an example, but my knowledge was more limited, yet it follows the regex:

<textarea([^>]+)[>]([^<]+[^t]+[^e]+[^x]+[^t]+[^a]+[^r]+[^e]+[^a]+[^>]|.*)<\/textarea>

Upshot: https://regex101.com/r/iW4xG3/1

However the other answers show better and simpler paths, this is just an alternative to study

  • 1

    I was watching your change to my regex and in cases of invalid HTML (which I think is what you want to get around) take two textarea, is that the idea? https://regex101.com/r/oQ1qJ3/3

  • @Sergio yes that’s it

4

Hello friend I managed to do that way:

<textarea\b[^>]*>((\n*|.)*)<\/textarea>

Example: https://regex101.com/r/oJ0iP6/2

Explanation:

  • <textarea\b[^>]*> Catch the first tag, limit it \b ensures that the tag <textarea. the [^>]* box all characters except > preventing the tag to contain two >>
  • ((\n*|.)*)) Captures Group from tag content. Captures any line break \n* or | captures all characters .*
  • <\/textarea> Ends with closing tag capture
  • 1

    It worked perfectly in all tests I did +1, just missing an explanation about the regex.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.