We have then:
urls = ["https://www.exemplo.com/", "https://www.exemplo.com/home/", "https://www.exemplo.com/logo.png", "https://intranet.exemplo.com/", "https://admin.exemplo.com/login", "https://www.exemplo.com/sobre/", "https://www.exemplo.com/shell.php.log", "https://www.exemplo.com/background.jpg"]
Let’s filter those who end up with png/jpg or who don’t have "www".
With and regex in python:
import re
bloqueados = []
for url in urls:
img = re.compile('^.*\.(jpg|JPG|png)$')
www = re.compile('(.*?)//www.(.*?)')
if(img.match(url) or not www.match(url)):
bloqueados.append(url)
print(bloqueados) # ['https://www.exemplo.com/logo.png', 'https://intranet.exemplo.com/', 'https://admin.exemplo.com/login', 'https://www.exemplo.com/background.jpg']
OR
import re
bloqueados = [url for url in urls if(re.compile('^.*\.(jpg|JPG|png)$').match(url) or re.compile('(.*?)//www.(.*?)').match(url) == None)]
print(bloqueados) # ['https://www.exemplo.com/logo.png', 'https://intranet.exemplo.com/', 'https://admin.exemplo.com/login', 'https://www.exemplo.com/background.jpg']
Although for this simple case I wouldn’t use regex, I would:
bloqueados = [url for url in urls if url[-4:] == '.png' or url[-4:] == '.jpg' or 'https://www.' not in url]
print(bloqueados) # ['https://www.exemplo.com/logo.png', 'https://intranet.exemplo.com/', 'https://admin.exemplo.com/login', 'https://www.exemplo.com/background.jpg']
With regex in javascript:
var bloqueados = []
var ext;
var www;
for(var url in urls) {
if(/^.*\.(jpg|png)$/.test(urls[url]) || !/(.*?)\/\/www.(.*?)/.test(urls[url])) {
bloqueados.push(urls[url])
}
}
console.log(bloqueados); // ["https://www.exemplo.com/logo.png", "https://intranet.exemplo.com/", "https://admin.exemplo.com/login", "https://www.exemplo.com/background.jpg"]
No regex in javascript:
var bloqueados = []
var ext;
var www;
for(var url in urls) {
exts = urls[url].split('.');
ext = exts[exts.length - 1];
if(ext == 'png' || ext == 'jpg' || urls[url].indexOf("//www.") < 0) {
bloqueados.push(urls[url])
}
}
console.log(bloqueados); // ["https://www.exemplo.com/logo.png", "https://intranet.exemplo.com/", "https://admin.exemplo.com/login", "https://www.exemplo.com/background.jpg"]
In python? Or in javascript?
– Miguel
in javascript but can also be in python @Miguel
– user45474
I’m gonna do both
– Miguel
It is worth mentioning that this is a thing for comparison of strings, do not need Regex for simple thing like this.
– Bacco
Does it have to be with regex? @Bacco is right, in this case it is not necessary. We can do more simply
– Miguel
@Miguel if you want, post with Regex to answer the question, but give an example with substring too, I think there values the answer.
– Bacco
@Bacco if I knew I wouldn’t be here asking I’m trying on the regexonline but not wanting to go through the whole list
– user45474
@Nikobellic has several online tests. Almost all you have to indicate that you are multiline somewhere to test lists.
– Bacco