I have a plugin Wordpress that does something similar using the tags Open Graph inserted into many web pages today.
Example of a video on Youtube:
<meta property="og:site_name" content="YouTube">
<meta property="og:url" content="http://www.youtube.com/watch?v=aZMbTFNp4wI">
<meta property="og:title" content="No Woman, No Drive">
<meta property="og:image" content="https://i1.ytimg.com/vi/aZMbTFNp4wI/maxresdefault.jpg">
<meta property="og:description" content="Download directly from us: http://ldr.fm/tX6XP Download from iTunes: https://itun.es/i6F668z Follow: Hisham Fageeh: http://Twitter.com/HishamFageeh Fahad Alb...">
<meta property="og:type" content="video">
<meta property="og:video" content="http://www.youtube.com/v/aZMbTFNp4wI?version=3&autohide=1">
<meta property="og:video:type" content="application/x-shockwave-flash">
<meta property="og:video:width" content="1920">
<meta property="og:video:height" content="1080">
In addition to the OG, there are also the Twitter Cards and other social Meta Data :
<meta name="twitter:url" content="http://www.youtube.com/watch?v=aZMbTFNp4wI">
<meta property="al:android:url" content="http://www.youtube.com/watch?v=aZMbTFNp4wI">
<meta property="al:ios:app_name" content="YouTube">
The process in the plugin is:
<input>
text where the user pastes the link,
- AJAX request to a PHP function,
- PHP reads the URL and analyzes the content by extracting OG and Twitter information, from which it extracts: title, description, representative image,
- return the information in JSON format to Javascript and render the received content using jQuery.
The following code is part of the function that AJAX invokes and that processes the results of the query to the URL. Wordpress does the REQUEST HTTP
using the Curl extension or PHP Streams (depending on the case). But, anyway, I just need to call the function wp_remote_get
, it returns the result to me and I process the $response['body
]`:
if ( $data = wp_remote_retrieve_body( $response ) )
{
$rmetas = array(); // Array for JSON
libxml_use_internal_errors(true);
$doc = new DomDocument();
$doc->loadHTML($data);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$ogs = $xpath->query( $query );
foreach ( $ogs as $meta )
{
$property = $meta->getAttribute('property');
$content = $meta->getAttribute('content');
$rmetas[$property] = $content;
}
if( empty( $rmetas ) )
wp_send_json_error( array( 'error' => __( 'No OG data in the page.' ) ) );
/* Meta Data for the post */
if( $autor = $this->xpath_query( $xpath, 'meta', 'name', 'author', 'content' ) )
$rmetas['author'] = $autor;
if( $date = $this->xpath_query( $xpath, 'meta', 'name', 'dc.date', 'content' ) )
$rmetas['date'] = $date;
if( $url = $this->xpath_query( $xpath, 'link', 'rel', 'shorturl', 'href' ) )
$rmetas['shorturl'] = $url;
if( $aurl = $this->xpath_query( $xpath, 'meta', 'property', "article:author", 'content' ) )
$rmetas['authorurl'] = $aurl;
$twits = $xpath->query('//*/meta[starts-with(@property, \'twitter:\')]');
foreach ( $twits as $meta )
{
$property = $meta->getAttribute('property');
$content = $meta->getAttribute('value');
if( 'twitter:site' == $property )
$rmetas['twitter'] = $content;
}
wp_send_json_success( $rmetas ); // Esta função inclui um die(); o erro abaixo não roda
}
wp_send_json_error( array( 'error' => __( 'Undefined error.' ) ) );
As darlings of Domxpath I did with help of Stack Overflow, simply searching inside this advanced search until I find something suitable.
If the meta information is not present on the page, one could make a Scrape traditional, but I never got to that point. But I see interesting things here (----->) in the column Related:
Needs to be in PHP or can be in Javascript?
– Kazzkiq
@Kazzkiq has to be recorded in the database, I think only with Javascript will not give, it is necessary PHP.
– Filipe Moraes