Decrease span field value with preg_replace

Question

Decrease span field value with preg_replace

Asked 10 years, 10 months ago

Viewed 361 times

6

I am trying to change all the values of the fields it contains span class.

Example the site is like this:

<div id="isOffered">
   <a class="price addBetButton footballBetButton" id="bk_82285689_mk_sel1" href="">
   <span class="priceText wide UK">1/2</span>
   <span class="priceText wide EU">1.50</span>
   <span class="priceText wide US">-200</span>
   <span class="priceText wide CH">1.50</span>
   <span class="priceChangeArrow"></span>
   <input class="betCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" />
   <input class="originalBetCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" /> 
   </a>
</div>

What’s the easiest way for me to get my values back 1.50, 200, 1.50 and decrease 20% of the original value using the function preg_replace?

2

Want to do this with PHP or JS? if you don’t know then pf explains better what you want to do... what functionality.

– Sergio

2015/02/23 at 00:57
With PHP, using the preg_replace function.

– Cassiano José

2015/02/23 at 01:17
1

1/2 enters? 200 enters as negative?

– Papa Charlie

2015/02/23 at 03:14
Use regex to parse HTML is to give in to Chtulhu’s appeal

– Tivie

2015/02/23 at 03:16

4 answers

9

As already mentioned in response from Tivie, regular expressions are not recommended to analyze a structure like the HTML, besides she’s not a regular language, do not use regex when there are better tools that can do this job.

Read more about this in this article (in English): Regular Expressions: Now you have two problems

I’ll follow the same path Tivie and use DOMDocument and DOMXPath to analyze the HTML, but another can be used parser, like the Simple HTML DOM Parser for example.

$url = "paginahtml.html";         // Link da página
$outputFile = "novapagina.html";  // Arquivo onde será salvo as modificações

$html = file_get_contents($url); // Pega o conteúdo da página

$DOM =  new DOMDocument();
$DOM->loadHTML($html);

$xpath = new DomXpath($DOM);

$prices = $xpath->query('//*[contains(concat(" ", normalize-space(@class), " "), "priceText ")]');
$percent = 20.0 / 100.0; // 20%

foreach($prices as $price){
    $value = $price->nodeValue;
    $floatValue = floatval($value);
    $finalValue = $floatValue - ($percent * $floatValue);
    $price->nodeValue = $finalValue; // Salva o valor final com desconto de 20%
}

file_put_contents($outputFile, $DOM->saveHTML()); // Salva as modificações
echo "Done!";

DEMO

The above example uses the function file_get_contents to get the contents of the page and save the modifications in a new file with file_put_contents.

The code worked as expected by passing the link from the page provided in that comment. The expression used in query will return the desired results if the Node present part of attribute name class, in that case priceText, with the function normalize-space of Xpath replace surplus spaces by a single space and thus validate the expression.

To display the modifications on the screen you can use the echo.

echo $DOM->saveHTML();

Browser other questions tagged php regex

You are not signed in. Login or sign up in order to post.

by Tivie • **545** points · Answer 1 · 2015-02-23T03:14:55+00:00

Parsing HTML with regex is a bad option and can lead to madness. There are many ways regex fails to read HTML (e.g., upper or lower case TAGS, spaces between classes, extra lines between html elements, etc...)

Regex means "Regular Expression", regular expression, and HTML is not a regular language. It will invariably break somewhere...

That said...

The best way is to use a real "parser". Fortunately, there are several options in PHP.

I advise you to use the Domdocument and the Domxpath included in PHP by default. Here’s an example:

HTML

$html = '
<html>
<head></head>
<body>
    <div id="isOffered">
       <a class="price addBetButton footballBetButton" id="bk_82285689_mk_sel1" href="">
           <span class="priceText wide UK">1.2</span>
           <span class="priceText wide EU">1.50</span>
           <span class="priceText wide US">200</span>
           <span class="priceText wide CH">1.50</span>
           <span class="priceChangeArrow"></span>
           <input class="betCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" />
           <input class="originalBetCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" /> 
       </a>
    </div>
</body>
</html>';

PHP code

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

//Lista de spans filhos de div"isOffered"->a
//So lista as divs que contenham a class 'pricetext'
$nodeList = $xpath->query("*/div[@id='isOffered']/a/span[contains(concat(' ', @class, ' '), ' priceText ')]");

foreach ($nodeList as $node) {
    if ($node instanceof \DOMElement) {
        // Le o valor do span e transforma num inteiro
        $value = (float) $node->nodeValue;

        // Altera o valor do span
        $node->nodeValue = $value * 0.8;
        var_dump($node->nodeValue);
    }
}

//salva as alterações feitas ao documenthtml
//e guarda na variavel newHtml
$newHtml = $doc->saveHtml();

To prevent Domdocument from choking on HTML documents with errors, you can add this line at the beginning of your code:

libxml_use_internal_errors(true) AND libxml_clear_errors();

by Adir Kuhn • **2,342** points · Answer 2 · 2015-02-26T16:35:27+00:00

I also recommend using dom parser but just for the fun ta ai a beta version using regex

<?php

$html = <<<XXX
<div id="isOffered">
   <a class="price addBetButton footballBetButton" id="bk_82285689_mk_sel1" href="">
   <span class="priceText wide UK">1/2</span>
   <span class="priceText wide EU">1.50</span>
   <span class="priceText wide US">-200</span>
   <span class="priceText wide CH">1.50</span>
   <span class="priceChangeArrow"></span>
   <input class="betCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" />
   <input class="originalBetCode" type="hidden" value="0]SK@82285689@314222649@NB*1~2*0*-1*0*0]CPN:0" /> 
   </a>
</div>
XXX;

$re = "/(span.*pricetext.*>)([\d\/.-]+)/im";

$ret = preg_replace_callback($re, function($matches){
    $matches[2] = ((float)$matches[2]) * .8;
    return $matches[1] . $matches[2];
}, $html);

echo $ret;

https://ideone.com/4kEPf9

by JJoao • **5,113** points · Answer 3 · 2015-04-07T14:20:21+00:00

And by the way a clandestine and pornographic response (Perl :)

with regular expressions:

perl -pe 's/<span.*?priceText.*?>\K(.+?)(?=<)/$1*0.8/e' span.xml

With xml parser:

#!/usr/bin/perl
use XML::DT;
my $filename = shift or die("Erro: usage $0 file.html\n");

print dt($filename, 
            span => sub{$c *= 0.8 if $v{class} =~ /^pricetext/i; toxml },
            -html => 1,
        );

# $c - contents after child processing
# %v - hash of attributes