-4
I’m trying to perform a filter using Regex to find the results that are within the option value. but I can’t take from the separate selects.
when I use the expression : <option value="(.+?)"
returns of all, when in fact I only want the case of "fromPort"
I also tried as follows, but not resulting in any data found (?<=select name="fromPort" class="form-inline">)\s*.*(?=select)
<select name="fromPort" class="form-inline">
<option value="Paris">Paris</option>
<option value="Philadelphia">Philadelphia</option>
<option value="Boston">Boston</option>
<option value="Portland">Portland</option>
<option value="San Diego">San Diego</option>
<option value="Mexico City">Mexico City</option>
<option value="São Paolo">São Paolo</option>
</select>
<p>
<h2>Choose your destination city:</h2>
<select name="toPort" class="form-inline">
<option value="Buenos Aires">Buenos Aires</option>
<option value="Rome">Rome</option>
<option value="London">London</option>
<option value="Berlin">Berlin</option>
<option value="New York">New York</option>
<option value="Dublin">Dublin</option>
<option value="Cairo">Cairo</option>
</select>
Each language has its own variant of regular expression syntax, so whenever the subject is regex it is important to inform in which language you are working. Parse HTML with regex not something recommended, in this section for example if the author of HTML makes an update changing the order of the attributes would have to rewrite its regex. There are lots of HTML and XML analysis tools on the internet and depending on the language you are using the HTML parser may be embedded in the language framework
– Augusto Vasques
Actually, it’s for academic purposes, I’m not using language, I’m working directly on regex101. So I’d like to know the possibility of doing this in regex, I know it’s possible, but I can’t come to any conclusion..
– Willian Lima
In the technical and academic world, analyzing HTML with REGEX is considered bad practice. Because it is classified as a type 2 language in the Chomsky hierarchy, HTML must be analyzed by a DPDA state machine with AST and state stack, and REGEX cannot analyze semantic variations. See this text Analyzing Html in the Cthulhu way if you have difficulties with English translate it to Portuguese by right-clicking and selecting translate.
– Augusto Vasques
Thanks for the @Augustovasques tip, I will read the article.
– Willian Lima
Hello Willian, do not need regex for this, if it is a string and this using Javascript can use the Domparser, if it is a string in the back end with PHP you can use the Domdocument::loadHTML, if it is Java you can use lib jsoup ... if you cite the language you will use (and if it is back-end or front-end) I can suggest a better example, because like @Augustovasques, regex may have problems with minimal unexpected "variants"
– Guilherme Nascimento
Using an html/xml parser is usually the best option, as they said above. For example, the regex of the answer below is naive and fails if you have two
option
on the same line, or one whose closure is on another line, or one of them commented, or if theselect
has other attributes, orname
andclass
are in another order, etc. Any minimal variation will require a change in the regex that is not always trivial, and the tendency is that it becomes so complicated that it is not worth it anymore. Further reading: here and here– hkotsubo
And just to quote a few more examples of why it is not good to use regex to manipulate HTML: https://answall.com/a/440262/112052 | https://answall.com/a/509938/112052
– hkotsubo