Regular expressions

Question

Regular expressions

Asked 10 years, 6 months ago

Viewed 236 times

3

I have a little difficulty to assemble the regular expressions, I’m trying to work with this code:

<?php 
$url = file_get_contents('http://ciagri.iea.sp.gov.br/precosdiarios/');
preg_match_all($expressao, $url, $conteudo);
echo $conteudo; 
?>

I need to pick up the prices between these codes:

<tr style="background-color:White;">
    <td style="width:170px;">
        Mandioca para mesa
    </td>
    <td style="width:120px;">
        Mogi Mirim
    </td>
    <td align="right" style="width:70px;">
        11,50
    </td>
    <td align="center" style="width:70px;">
        cx.23 kg
    </td>
    <td style="width:200px;">
        <div id="ctl00_ContentPlaceHolder1_gridRecebidos_ctl95_PanelGridObs">
        </div>
    </td>
</tr>
<tr>
    <td style="width:170px;">
        Mandioca para mesa
    </td>
    <td style="width:120px;">
        Pindamonhangaba
    </td>
    <td align="right" style="width:70px;">
        28,00
    </td>
    <td align="center" style="width:70px;">
        cx.23 kg
    </td>
    <td style="width:200px;">
        <div id="ctl00_ContentPlaceHolder1_gridRecebidos_ctl96_PanelGridObs">
        </div>
    </td>
</tr>
<tr style="background-color:White;">
    <td style="width:170px;">
        Mandioca para mesa
    </td>
    <td style="width:120px;">
        Sorocaba
    </td>
    <td align="right" style="width:70px;">
        8,79
    </td>
    <td align="center" style="width:70px;">
        cx.23 kg
    </td>
    <td style="width:200px;">
        <div id="ctl00_ContentPlaceHolder1_gridRecebidos_ctl97_PanelGridObs">
        </div>
    </td>
</tr>

To get the price of each city:

-What would be the best standard to use?

You’re looking for content from another web page?

– MarceloBoni

2015/01/13 at 17:03
Tip: Do not use regex to parse HTML, take a look at Xpath, YQL and htmlSQL

– fpg1503

2015/01/13 at 17:09
1

Yes, I am wanting to pick up the quotation of a product that is updated daily. I will take a look at Xpath and YQL.

– Rodolfo Oliveira

2015/01/13 at 17:10
1

For PHP there is htmlSQL (https://github.com/hxseven/htmlSQL)

– fpg1503

2015/01/13 at 17:12

2 answers

5

The ideal is to use XPATH to get these prices. Looking at this page you reported would look like this:

$dom = new DomDocument;
$dom->loadHTMLFile("http://ciagri.iea.sp.gov.br/precosdiarios/");

$xpath = new DomXPath($dom);
// essa query pega o todos os TDs na posicao 3 da primeira tabela com a classe "tabela_dados"
$nodes = $xpath->query("(//table[@class='tabela_dados'])[1]/tr/td[position()=3]");

foreach ($nodes as $i => $node) {
    echo $node->nodeValue . "\n"; // vai imprimir todos os preços
}

Thank you, coming home

– Rodolfo Oliveira

2015/01/13 at 17:17
You have the DOM too (which I find simpler)

– gmsantos

2015/01/13 at 17:30
Warning: Domdocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in http://ciagri.iea.sp.gov.br/precosdiarios/, line: 3654 in This error you know the reason for this error?

– Rodolfo Oliveira

2015/01/14 at 12:02
Add libxml_use_internal_errors(true); at the beginning of the code. Although the warning should work normally.

– André Ribeiro

2015/01/14 at 12:11
Is there any way I can access a certain position of the $nodes variable, as if it were an array?

– Rodolfo Oliveira

2015/01/19 at 12:45
@Rodolfooliveira if this answer solves your original question, you can mark it as accepted. See more on [tour].

– gmsantos

2015/01/19 at 18:27
@Rodolfooliveira You can access so: $node->item(3); // retorna item na posição 3. Don’t forget to mark the answer that solved your problem as accepted :)

– André Ribeiro

2015/01/19 at 18:37
@Andréribeiro already marked the answer as accepted. About what I asked up there I still could not understand, would be $nodes->nodeValue(3); or is this item to be an item? 'cause as I know which position is the word I want I wouldn’t need the foreach,

– Rodolfo Oliveira

2015/01/19 at 20:10
@Rodolfooliveira Seria $node->item(3) to get the item in position 3. item is a method.

– André Ribeiro

2015/01/19 at 20:14

Show 4 more comments

Browser other questions tagged php regex

You are not signed in. Login or sign up in order to post.

by Victor Stafusa • **63,338** points · Answer 1 · 2015-01-13T17:13:41+00:00

I managed to do with this regex:

<tr[^>]*>\s*<td[^>]*>[^<]*<\/td>\s*<td[^>]*>[^<]*<\/td>\s*<td[^>]*>\s*(\S*)

It is important you capture all the Matches that result.

How this expression works?

We’ll break them into pieces:

<tr[^>]*> - Start with <tr, then use the [^>]> to skip all the way to find one > and consumes the >. I mean, it consumes the <tr blablabla>. Also works if there is only <tr>.
\s* - Consumes a lot of blank spaces and line breaks.
<td[^>]*>[^<]*<\/td>\s* - Start with <td, then use the [^>]> to skip all the way to the > and consumes the >. Keep consuming until you find one more < and then consumes the </td> and the blanks and line breaks that follow. That is, consumes the first <td blabla>blablabla</td>.
Same thing as item 3, will consume the second <td blabla>blablabla</td>.
<td[^>]*>\s* - Consumes the <td blabla> that follows and the blanks and line breaks. Right after that we have the price.
(\S*) - Captures all the characters that follow until you find a blank space (and does not consume the white space). That is, this captures the price.

Tested here. To check, place regex in the first field and g in the second. In the area below put the text where you want to search (in the case of HTML).