2
I’m doing some automated testing for a project legacy in the MVC model, however, there is a requirement for one of them to capture all fixed strings in HTML and JS codes. Since the project company is going through an internationalization process of its content, transforming its fixed strings into resource files.
I did this regex: ([\n]|^)(?<Value>(?!.*?\/\/|.*?@\*|.*?@.*?@|.*?\/\*|.*?<!--|.*?\\\*)([^\n]*?)[áâãàéêèíîìóôõòúûù].*)
It partially solves my problem as it identifies accented characters in the capturing code IF are not in comments (// , /* , @* , @ , <--
).
So since there are no HTML or JS functions that use accents, I can assume that these are fixed strings.
After doing this, I was able to identify some pages that have fixed strings that should be transformed into resource files, but this regex does not cover all cases.
I would like a regex that:
- Can capture fixed strings even without accented characters in HTML and JS codes.
- Ignore string cases in comments.
Would exist in any of these languages some particularity of syntax that could help me delimit where regex should capture to identify these strings?
Could you explain it better? What do you mean
strings
? Only those that are outside the tags? What are the possibilities ofstrings
fixed? What should not be considerate? What yourregex
not catching? Could add an example of the page that is in trouble?– Randrade
This is some information that can help you get a faster response.
– Randrade
@Randrade I will edit the post to try to explain better, what specifically you did not understand or became vague? the strings I say are any group of characters that are not adaptable by changing the language, such as "run" or 'yes'. What should not be considered are words within tags in html cases like: <do not consider> consider < nc> My regex is not capturing comments (and should be) and fixed strings in the code that have no accented characters. There is a page that is in trouble, it is a big project, there may be hundreds of pages n considered
– Paz
So, this complicates a bit. To mount regex you need to know at least what the pattern to consider or not consider. If you say that everything that is in quotes should be considered, it is one thing. If you say that everything that is out of tags, it is also a possibility. Now, are there other possible cases? Trying to do something generic like this without knowing the possibilities can be complicated.
– Randrade
this is the challenge, I wonder if there is some particularity that I could not see in some of these languages that make a pattern that would define the beginning and end of the capture of regex. Maybe consider double quotes inside a content surrounded by tag opening and then closing
– Paz
It’s complicated, worse it should be something generic, since the test will scan more than 1000 files that have been changed by dozens of different programmers
– Paz
Let’s go continue this discussion in chat.
– Paz
I think using REGEX for this is not a good idea. There will always be a case that you can not cover. I suggest trying some proper parser for html. See that answer http://stackoverflow.com/a/1732454/460775
– EMBarbosa
@Embarrassing however how to include this in automated testing? And JS cases?
– Paz