Capture e-mail from pages

Asked

Viewed 307 times

0

I have an e-mail system with the address:

 http://www.site.com.br/123/123/123.php?p=1&codelist=1

that goes up to:

 http://www.site.com.br/123/123/123.php?p=460&codelist=1

(p=1 is the page)

That is, 460 pages. In this system I have all the data of my customers.

I need a code that accesses these 460 urls and captures only the email listed on the site and ignores other data. Anyway, just email and save on a TXT.

Can anyone help me? I have no idea how.

  • 2

    " In this system I have all my customers' data" - how are you generating the content of those pages? you can’t just go to the database?

  • Marcelo, is this data in a database or printed directly in html? Can you send an example of how these values are recorded?

  • @Sergio, it is an old and poorly done system... it presents in html on the page itself in tables... only that it is a lot of data and I need at the moment only the emails to contact.

  • @Csorgo, are listed like this: <tr><td bgcolor=#ECEDE8>21/11/2009<br>08:05:38</td><tr><td bgcolor=#ECEDE8><font size=1 class="slk">Status<br>Inactive<br><a href="managerial users.SP? p=1&codativacao=116034&codlist=30" style="font-family:Trebuchet MS,Tahoma,Arial,Verdana; font-size:12px; color: #ffffff; text-Decoration:None; background: #109809;">&nbsp;Enable Cadastre & nbsp;</u></a></td><td bgcolor=#ECEDE8><font class="slk"></td><td bgcolor=EC#EDE8><font class="slk"> [email protected]</td>

  • each page with an email?

  • I recommend that you edit your question and provide some more information, for example: How is this data saved? This helps to better understand your problem and also to better organize the site not to get all played here in the comments.

Show 1 more comment

1 answer

0


example - ideone - capture email

$url="http://www.site.com.br/123/123/123.php?p=1&codelist=1";
$text=file_get_contents($url);

$res = preg_match_all(
"/[a-z0-9]+[_a-z0-9\.-]*[a-z0-9]+@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/i",
$text,
$matches
);

//caso haja somente um email
$email = reset($matches[0])."\n";
file_put_contents('emails.txt', $email, FILE_APPEND);
// fim caso haja um email

/* caso haja mais de um email
foreach($matches[0] as $email){
    file_put_contents('emails.txt', $email."\n", FILE_APPEND);
}
*/

file_get_contents - Reads all the contents of a file to a string

The function preg_match_all() return an integer with the number of occurrences found by the regular expression.

reset() returns the array internal pointer to the first element and returns the value of the first element of the array.

file_put_contents writes a string to a file, if this file does not yet exist it creates the file.

FILE_APPEND adds a value to an already created file.

The above script is for a page, but nothing prevents you from making a loop by varying the value of p in $url

for ($x = 1; $x <= 460; $x++) {
   $url="http://www.site.com.br/123/123/123.php?p=".$x."&codelist=1";
   $text=file_get_contents($url);

   $res = preg_match_all(
"/[a-z0-9]+[_a-z0-9\.-]*[a-z0-9]+@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/i",
  $text,
  $matches
  );

  //caso haja somente um email
  $email = reset($matches[0])."\n";
  file_put_contents('emails.txt', $email, FILE_APPEND);
  // fim caso haja um email
}
  • Okay, you’ve done a lot for what I need. However, excuse the lack of knowledge, as are 460 pages, how should I put to pick up from all the page? and not have to keep changing p=1 p=2 etc?

  • @Marcelo edited the answer. See the end of the loop for.

  • Leo, when I use this code: it doesn’t work, however, when I do it by url without using the FOR it works normally. Do you have any idea what it might be? There’s nothing wrong? <?php&#xA;&#xA;for ($x = 1; $x <= 3; $x++) {&#xA; $url="http://www.site.com.br/123/123/123.php?p=".$x."&codelist=1"; //eu fiz a alteração da url&#xA;$text=file_get_contents($url);&#xA;&#xA;$res = preg_match_all(&#xA;"/[a-z0-9]+[_a-z0-9\.-]*[a-z0-9]+@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/i",&#xA;$text,&#xA;$matches&#xA;);&#xA;&#xA;&#xA;&#xA;&#xA;foreach($matches[0] as $email){&#xA; file_put_contents('emails.txt', $email."\n", FILE_APPEND);&#xA;}&#xA;&#xA;}&#xA;&#xA;&#xA;?>

  • This code of yours has a strange encryption. I suggest copying my paste into a plain text editor (notepad) save and then copy and paste into your application.

  • Leo, end up going to do just that and the.txt emails file remains blank. When I put your first answer, it works.

  • In fact you only have a page and not 460. But the strange that one by one works and the loop is not. In my test I did with 3 pages same exem1.htm exem2.htm and exem3.htm without parameters in the url.

  • We would need to see this in chat but you are not opening the chat for us

  • http://www.site.com.br/123/123.php would like to see her code

  • I did a test to open a page with several parameters just like yours and it worked correctly. $url="http://.......... /sos/exem.php? p=". $x."&codelist=1";

  • Leo, the mistake was mine! It was working yes! Thank you so much for the help!

Show 5 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.