Parse HTML regex problem

Asked

Viewed 247 times

1

Well the doubt I have is this, I need to get the following excerpt of HTML right below:

HTML:

        <section class="ovw-summary">

                <div class="ovw-summary__balance balance-amounts">
                    <header><h3>Meu dinheiro</h3></header>
                    <div class="box-container mp-box-shadow bg-trama">

                        <dl class="balance-amounts__list available-money">
                            <dt>Disponível</dt>
                            <dd class="price price-large mlb">
                            <span class="price-symbol">R$</span> <span class="price-integer">0</span><span class="price-decimal-mark">,</span><span class="price-decimal">00</span>
                            </dd>
                        </dl>
                        <dl class="balance-amounts__list account-money">
                            <dt>Em conta</dt>
                            <dd class="price">
                            <span class="price-symbol">R$</span> <span class="price-integer">24</span><span class="price-decimal-mark">,</span><span class="price-decimal">99</span> 
                            </dd>
                        </dl>

I did this way to read the HTML and returns the correct data. Here my code in PHP I did a regex see how it is:

$SaldoEmConta = '~<dl class="account-money">\s*<dt>Em conta<\/dt>\s*<dd class="ch-price" name="balance_total" value=".*?">R\$ (.*?)<sup>(.*?)<\/sup>\s*<a href=".*?" class="icon-info-balance">\s*<i class="ch-icon-help-sign">\s*<\/i>\s*<\/a>\s*<\/dd>\s*<\/dl>~';
preg_match($SaldoEmConta, $RetornoSaldo, $ArrayConta);

$SaldoDisponivel = '~<dl class="open-detail">\s*<dt class="available-label">Dispon&iacute;vel<\/dt>\s*<dd class="ch-price available-price" name="balance_available" value=".*">R\$ (.*?)<sup>(.*?)<\/sup>\s*<\/dd>~';
preg_match($SaldoEmConta, $RetornoSaldo, $ArrayDisponivel);

echo 'Em conta: R$ ' . $ArrayConta[1].','.$ArrayConta[2]  . ' Disponivel: R$ ' . $ArrayDisponivel[1].','.$ArrayDisponivel[2] .'<hr>';

But for some reason I can’t get hold of those values someone can help me correct my regular expression?

  • You’re opening HTML on the server side and you want to change that HTML to upload to the client side, is that it? You can explain better what you want to do and where this HTML comes from?

  • After accessing by precise Curl takes these values in the html that returns .

  • Working HTML in PHP via Regexp is unreliable... you can’t get this data in another JSON format?

  • No da , before they update the layout of the site where access to search the value was returning me normally data.

2 answers

1

Do not use Regex, use an HTML parser like simpleparser:

http://simplehtmldom.sourceforge.net/

or the ganon:

http://code.google.com/p/ganon/

The 2 are much quieter to work with HTML.

An example with simpleparser:

$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div[id=hello]', 0)->innertext = 'foo';
echo $html; // Output: <div id="hello">foo</div><div id="world" class="bar">World</div> 

Just to complement, it’s always a good read: https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

0

You can use this regular expression, I think it matches what you need:

balance_unavailable['"]\s+value=['"](\d*\.?\d*).*balance_dispute['"]\s+value=['"](\d*\.?\d*)
  • here gave error Parse error: syntax error, Unexpected ']'

Browser other questions tagged

You are not signed in. Login or sign up in order to post.