Take content from a list of a website and save to the PHP database

Asked

Viewed 217 times

0

I want to enter the site below, save all html in a variable, clear html and save the content I want in the database Mysql via PHP 7.

The site is: http://guildstats.eu/bosses?monsterName=&world=Ferobra&rook=0

At first I "saved the HTML of the page" in a variable, as code below:

$mundo = 'ferobra';
$url = 'http://guildstats.eu/bosses?monsterName=&world=' . $mundo . '&rook=0' . $mundo;

function curl_get_contents($url)
{
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}

$pagina = curl_get_contents($url);

My difficulty now is cleaning the HTML, save this in arrays and then popular the database.

Can someone help me?

  • What would be clear HTML?

  • If you want to filter the boss, recommend this tutorial: https://medium.com/@valdeirpsr/extracting-information-from-a-site-with-php-bd3e3dec98e5

  • It is that on this page there is a list (html)... I want to save its contents in the database. In the database it will have the same structure as the list. For example... if there is a column in the list called "name" and another "date of death", in the database you will have these two columns. I want to take the contents of the list that is in html and save in the database.

  • I managed to clean up a bit through php’s explode() function.

  • It is not recommended to use explode. The ideal is to use the DOMDocument (explain how to use in the tutorial and in the answer). And after filtering everything (also explain), you can use that example.

2 answers

0

From what I understand you want to save only part of the html in the right bank? Well you can try to make a bank explode by passing the beginning of the html you want and then take the index of this html and play in the Bank

$html = explode("começo_do_html_a_salvar", "$pagina")

After you blow up at the end of HTML to get only HTML

$htmlToSave = explode("final_do_html_A_salvar" , $html[0]);

Then you just save at the bank by passing $htmlToSave[0]

  • Thanks for the answer, Gabriel. I’m using the explode to clean. But now, I need to take this html that was "clean" and go through it saving the name of Boss, day of death, etc etc (which is the list itself)

  • Open the page I sent you and look at the html of the page so you understand

  • I can’t open the link, but I think I get it. I’ll think of a solution so I edit the answer

  • Okay, Gabriel! I’m racking my brain here looking for alternatives. The subject Web Scraping is probably the way I’m looking for information on. But, I look forward to your help too! Thank you very much

  • Okay! I’m doing some research, too. Have you looked at this site: http://www.deivison.com.br/phpquery-web-scraping-ja-imaginou-selecionar-elements-de-um-outro-site-com-php-utilizando-a-semantica-elements-como-do-css/ ? It seems quite explanatory

0

There is a technique for what you are wanting to do, it is an interesting subject, the name of the technique is Web scraping, with that name you can find many explanations on the internet of how to do this,I particularly recommend you use the method

preg_match_all();

it works together with regular expression (Regex).

  • Hi, Erick! I’m going to research this technique! This is the kind of information I need too. I’m not too lazy to search, the tricky thing is how to search for it. With this information of yours I’ll be able to search better! Thank you very much, Erick!

  • I found content that seems cool: http://www.deivison.com.br/phpquery-web-scraping-ja-imaginou-selecionar-elements-de-um-outro-site-com-php-utilizando-a-semantica-de-elements-comodo-css/

  • Ah cool, very interesting too.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.