The nearest solution for all cases is much more complex than a REGEX.
Unfortunately I could not make it more friendly,the final code got a little confused, but I believe I can still understand and I will explain the whole process.
Perks (in relation to that answer)
It has greater support for all types of domains, such as floripa.br or adult.ht.
Has support for public subdominios, for example <seusite>.blogspot.com and even <seusite>.s3.amazonaws.com and the like.
Requirements:
No extension, plugin, framework is needed... Just download the public list of all domains/ TLD this is available here (https://publicsuffix.org/list/public_suffix_list.dat) and specifying the location of the file on the mentioned line.
This document shall be updated periodically.
Code:
function pegaNome($url)
{
$url = parse_url($url, PHP_URL_HOST);
if (empty($url)) {
return false;
}
$generico = ['com', 'org', 'net', 'edu', 'gov', 'mil'];
$lista = array_filter(file('public_suffix_list.dat.txt')); // Download: https://publicsuffix.org/list/public_suffix_list.dat
$lista = array_merge($lista, ['*']);
$dominio = explode('.', $url);
$dominioTamanho = count($dominio) - 1;
$encontrado = [];
foreach ($lista as $tld) {
if (!in_array(substr($tld, 0, 1), ['!', '/', "\n"], true)) {
$correto = 0;
$partes = explode('.', $tld);
$partesTamanho = count($partes);
foreach ($partes as $i => $pedaco) {
if (!isset($dominio[$dominioTamanho - $partesTamanho + $i + 1])) {
break;
}
$pedaco = (array)trim($pedaco);
$pedaco = $pedaco === '*' ? $generico : $pedaco;
$correto += (int)(in_array($dominio[$dominioTamanho - $partesTamanho + $i + 1], $pedaco, true));
}
if ($correto === $partesTamanho) {
$encontrado[] = $correto;
}
}
}
if ($encontrado !== 0){
rsort($encontrado);
foreach($encontrado as $encontro){
if(!empty($dominio[$dominioTamanho - $encontro])){
return $dominio[$dominioTamanho - $encontro];
}
}
}
return $url;
}
Explanations:
Filing cabinet:
The file has four types of situations (ignoring blank spaces):
!tld
*.tld
// tld
tld
The code above ignores so much // tld, which are comments, as well as !tld, I don’t know the exact reason.
If it is *.tld indicates that he would be net.tld, com.tld for example, in most cases.
Checks:
When you ask to check a URL, for example https://seusite.blogspot.com is done exactly the following:
- Uses the
PHP_URL_HOST to obtain seusite.blogspot.com.
- Divide
seusite.blogspot.com for seusite, blogspot and com.
Then we need to check the domain used by your website:
- Checks that the last element is equal to
ac: com != ac
- Checks that the last set is equal to
com.ac, so that:
- Compares the penultimate element equal to
com: blogspot != com
- Compares the last element equal to
ac: com != ac
This is repeated for each line from this archive.
At a certain point will do exactly:
- Checks that the last set is equal to
blogspot.com:
- Compares the penultimate element equal to
blogpost: blogspot == blogpost
- Compares the last element equal to
com: com == com
Then you will save $encontrado[] = $correto, this will store the value 2, which is the number of parts that the "subdomain" has (.blogspot.com = 2, . net = 1, .a.b. c = 3).
In this same area, in the latest comparisons it will make:
- Checks that the last element is equal to
.com: com === com
This will also store the value 1 at the $encontrado.
Upshot:
In the end we caught the largest number of $encontrado and then we get the domain name based on it.
So if seusite.blogspot.com has the biggest $encontrado as 2 then just do $dominio[count($dominio)-2-1].
So why create an array? Why might it inform https://blogspot.com, then it would also be valid in both cases, however the count($dominio)-2-1 would then be -1. So then he passes to the next found domain, in this case .com and will return blogspot, normally.
There are some problems, for example the site
http://www.saopaulo.sp.gov.br, he returnsspinstead ofsaopaulo, which is the correct name. It treats saopaulo as subdomain of thesp. The solution to this is far more complex, it cannot be simple REGEX, unless you list all the cases and put them in a REGEX.– Inkeliz
Another example,
http://meusite.floripa.br, returnsFloripa. He’s a valid TDL, see here.– Inkeliz
@Inkeliz I understand, but this code is not universal, it only serves to meet the sites in the pattern of the question, ie, website with. or site.com.br.
– Sam