Regular expression to identify text box in HTML form

Asked

Viewed 333 times

2

Situation

Give up a page in any HTML where there is a form containing, at first, two text boxes, one for name and the other for email. Then came the need, to do some tests on Android, to get just a snippet of code and play on a list, but I tried in some ways to create a regular expression to rescue this part code but I was not successful. The library is being used org.apache.http.legacy stop obtaining the source code where no barrier was found in reading the following code.

Code

<!DOCTYPE html>
    <html>
       <head>
       </head>
       <body>
          <form>
             <label for="nome">Nome</label>
             <input type="text" name="nome" id="nome"/><br><br>
             <label for="email">email</label>
             <input type="text" name="email" id="email"/><br><br>
             <input type="submit" value="Enviar">
          </form>
       </body>
    </html>

Doubt

I would like to use regex to identify exactly the beginning and end of each input, being <input and the />, so for this HTML would result in for example two items:

<input type="text" name="nome" id="male"/>
<input type="text" name="nome" id="male"/>

After identification, I intend to throw every input into one array (But that part is quiet). Could someone help me with that expression? Hugs!

Obs.: The expression in this case would be independent of the language. If I use such a method or such a language (java, c#, javascript), then I will use the libs necessary for the treatment of regex.

Hugs! =)

  • 2

    HTML is not a regular language. Regular expressions would only solve your problem if your HTML was quite predictable and limited. What you probably need is a parser html. Be careful with XY problems.

  • @Pabloalmeida I didn’t say it is! As the idea is to use on Android, then the native programming language is JAVA. But the expression in this case is language independent. If I’m going to use such a method or such a language (java, c#, javascript), then I can use the libs necessary for the treatment of regex. Hugs.

  • @Cleidimarviana What Pablo Almeida meant was that it is very difficult for you to find a regex that is satisfactory to solve your problem, because HTML is a language too complex pro regex can solve. The best solution in this case is to use a parser. Instead of looking for a lib to handle regex, look for a parser for that language. If you indicate your favorite in the question, I’m sure someone can recommend a.

1 answer

1


You can use the following expression: <\s* input [^>]+ >/xg

<

Coincides with "<" literally

\S

will match any blank character [ r n t f].

*

between zero and unlimited times, as many times as possible, giving back as needed [greedy].

input

house with the word leterally (case sensitive).

[^>]+

match a single character not in the list.

+

between one and unlimited times, as many times as possible, giving back as needed [greedy].

>

unique character in the list, ">" literal (case sensitive).

>

coincides with ">" literally.

/g

global modifier, does not stop at first occurrence.

/x

Extended modifier ignores white space and comments.

Testing and explanation (English) : Regex_input_tag

More about Regex: Aurélio Regex

  • I think it will work like this! VLW

  • Qq thing comments and we adjust...

  • Of course! Hugs.

  • It would be possible for you to edit your reply and insert a description for the expression. Type, explaining what each snippet does. abs.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.