Decrease span field value with preg_replace

Asked

Viewed 361 times

6

I am trying to change all the values of the fields it contains span class.

Example the site is like this:

<div id="isOffered">
   <a class="price addBetButton footballBetButton" id="bk_82285689_mk_sel1" href="">
   <span class="priceText wide UK">1/2</span>
   <span class="priceText wide EU">1.50</span>
   <span class="priceText wide US">-200</span>
   <span class="priceText wide CH">1.50</span>
   <span class="priceChangeArrow"></span>
   <input class="betCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" />
   <input class="originalBetCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" /> 
   </a>
</div>

What’s the easiest way for me to get my values back 1.50, 200, 1.50 and decrease 20% of the original value using the function preg_replace?

  • 2

    Want to do this with PHP or JS? if you don’t know then pf explains better what you want to do... what functionality.

  • With PHP, using the preg_replace function.

  • 1

    1/2 enters? 200 enters as negative?

  • Use regex to parse HTML is to give in to Chtulhu’s appeal

4 answers

9


As already mentioned in response from Tivie, regular expressions are not recommended to analyze a structure like the HTML, besides she’s not a regular language, do not use regex when there are better tools that can do this job.

Read more about this in this article (in English): Regular Expressions: Now you have two problems

I’ll follow the same path Tivie and use DOMDocument and DOMXPath to analyze the HTML, but another can be used parser, like the Simple HTML DOM Parser for example.

$url = "paginahtml.html";         // Link da página
$outputFile = "novapagina.html";  // Arquivo onde será salvo as modificações

$html = file_get_contents($url); // Pega o conteúdo da página

$DOM =  new DOMDocument();
$DOM->loadHTML($html);

$xpath = new DomXpath($DOM);

$prices = $xpath->query('//*[contains(concat(" ", normalize-space(@class), " "), "priceText ")]');
$percent = 20.0 / 100.0; // 20%

foreach($prices as $price){
    $value = $price->nodeValue;
    $floatValue = floatval($value);
    $finalValue = $floatValue - ($percent * $floatValue);
    $price->nodeValue = $finalValue; // Salva o valor final com desconto de 20%
}

file_put_contents($outputFile, $DOM->saveHTML()); // Salva as modificações
echo "Done!";

DEMO

The above example uses the function file_get_contents to get the contents of the page and save the modifications in a new file with file_put_contents.

The code worked as expected by passing the link from the page provided in that comment. The expression used in query will return the desired results if the Node present part of attribute name class, in that case priceText, with the function normalize-space of Xpath replace surplus spaces by a single space and thus validate the expression.

To display the modifications on the screen you can use the echo.

echo $DOM->saveHTML();

6

Parsing HTML with regex is a bad option and can lead to madness. There are many ways regex fails to read HTML (e.g., upper or lower case TAGS, spaces between classes, extra lines between html elements, etc...)

Regex means "Regular Expression", regular expression, and HTML is not a regular language. It will invariably break somewhere...


That said...

The best way is to use a real "parser". Fortunately, there are several options in PHP.

I advise you to use the Domdocument and the Domxpath included in PHP by default. Here’s an example:

HTML

$html = '
<html>
<head></head>
<body>
    <div id="isOffered">
       <a class="price addBetButton footballBetButton" id="bk_82285689_mk_sel1" href="">
           <span class="priceText wide UK">1.2</span>
           <span class="priceText wide EU">1.50</span>
           <span class="priceText wide US">200</span>
           <span class="priceText wide CH">1.50</span>
           <span class="priceChangeArrow"></span>
           <input class="betCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" />
           <input class="originalBetCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" /> 
       </a>
    </div>
</body>
</html>';

PHP code

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

//Lista de spans filhos de div"isOffered"->a
//So lista as divs que contenham a class 'pricetext'
$nodeList = $xpath->query("*/div[@id='isOffered']/a/span[contains(concat(' ', @class, ' '), ' priceText ')]");

foreach ($nodeList as $node) {
    if ($node instanceof \DOMElement) {
        // Le o valor do span e transforma num inteiro
        $value = (float) $node->nodeValue;

        // Altera o valor do span
        $node->nodeValue = $value * 0.8;
        var_dump($node->nodeValue);
    }
}

//salva as alterações feitas ao documenthtml
//e guarda na variavel newHtml
$newHtml = $doc->saveHtml();

To prevent Domdocument from choking on HTML documents with errors, you can add this line at the beginning of your code:

libxml_use_internal_errors(true) AND libxml_clear_errors();
  • Thanks buddy! Gave it right.

  • Buddy, give me another push! your code works, but where I’m getting the information, there are many div, tags and other things, then your code can not capture the values and only returns me "warnings", and other: how do I take these modified values to HTML? remember that I am using file_get_contents to take the data from a remote site and display it on my site in the same way, only with the values decreased. sorry my ignorance, I am at the beginning of the PHP course and do not know much yet rs. please give me more this help!

  • See the last line of my post

  • As for capturing the values, it has to do with XPATH, it probably has to be changed to match the path from which you are retrieving the information. I advise you to read this: http://www.dicas-l.com.br/arquivo/tutorial_xpath.php#. Vo5wvfmswxu You can help you understand how XPATH works, you will see that it is quite simple. Then just replace "*/div[@id='isOffered']/a/span[contains(concat(' ', @class, ' '), ' priceText ')]" by the correct xpath

  • Thanks friend! I will study the XPATH.

  • As for the Domdocument errors, when I applied your code at the beginning, no errors were displayed, but you also didn’t show me the values, it was all white. If you want to see how the html I’m looking for can go in the link: link

Show 1 more comment

3

I also recommend using dom parser but just for the fun ta ai a beta version using regex

<?php

$html = <<<XXX
<div id="isOffered">
   <a class="price addBetButton footballBetButton" id="bk_82285689_mk_sel1" href="">
   <span class="priceText wide UK">1/2</span>
   <span class="priceText wide EU">1.50</span>
   <span class="priceText wide US">-200</span>
   <span class="priceText wide CH">1.50</span>
   <span class="priceChangeArrow"></span>
   <input class="betCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" />
   <input class="originalBetCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" /> 
   </a>
</div>
XXX;

$re = "/(span.*pricetext.*>)([\d\/.-]+)/im";

$ret = preg_replace_callback($re, function($matches){
    $matches[2] = ((float)$matches[2]) * .8;
    return $matches[1] . $matches[2];
}, $html);

echo $ret;

https://ideone.com/4kEPf9

  • Dude yours worked too! Thanks for the help :)

0

And by the way a clandestine and pornographic response (Perl :)

with regular expressions:

perl -pe 's/<span.*?priceText.*?>\K(.+?)(?=<)/$1*0.8/e' span.xml

With xml parser:

#!/usr/bin/perl
use XML::DT;
my $filename = shift or die("Erro: usage $0 file.html\n");

print dt($filename, 
            span => sub{$c *= 0.8 if $v{class} =~ /^pricetext/i; toxml },
            -html => 1,
        );

# $c - contents after child processing
# %v - hash of attributes

Browser other questions tagged

You are not signed in. Login or sign up in order to post.