Despite the time, in order to close this question with a solution, it is simple if we use some specific libraries to do so.
Libraries
urllib.request
: responsible for capturing the html
page;
re
: library to use regular expressions.
Explanation of Code
To capture the content I used the following excerpt:
html_content = urllib.request.urlopen('http://dolarhoje.com/bitcoin').read().decode('utf-8')
I am assigning within the variable html_content
the content of html
of the page to consume later, the read()
does the reading, decode('utf-8')
is to prevent the site from being captured in other accentuation patterns, sometimes this happens.
When analyzing, we can see that there is a pattern where the coins symbol is shown within this of a tag
html
as an example:
<span class="symbol">฿</span>
<span class="symbol">R$</span>
Thus, we can construct a regular expression to capture the incidences of the symbols, being represented by the following excerpt:
symbols = re.findall(r'\"symbol\">([\S]+)</', html_content)
re
we are using the soon imported library at the beginning of the code;
findall()
is a function to capture in the form of a list, the incidences;
r'...'
is where we put the respective regular expression;
Thus, the regex
built on top of \"symbol\">([\S]+)</
captures the symbols and displays them in the following list: ['฿', 'R$']
While value=\"([.\d,]+)\"
captures the values and displays: ['1,00', '13583,42']
html_content
is the respective variable where we want to find the incidence - or how we say when we use regex
of match
.
And, we use the print print('{0} {1} vale {2} {3}'.format(symbols[0], values[0], symbols[1], values[1]))
to display the respective values using the format()
, thus obtaining the desired result:
฿ 1,00 vale R$ 13583,42
Full script
Below follows the script
complete:
# !/usr/bin/python
# -*- coding: utf-8 -*-
# começando com os imports
import urllib.request
import re
# capturando o conteúdo html
html_content = urllib.request.urlopen('http://dolarhoje.com/bitcoin').read().decode('utf-8')
# pegando os símbolos das moedas
symbols = re.findall(r'\"symbol\">([\S]+)</', html_content)
# pegando os valores das moedas
values = re.findall(r'value=\"([.\d,]+)\"', html_content)
# mostrando em console o resultado parecido com: '฿ 1,00 vale R$ 13583,42'
print('{0} {1} vale {2} {3}'.format(symbols[0], values[0], symbols[1], values[1]))
I hope this solution helps other people who may be having a similar doubt.
From what I’ve seen, the real value is inside a
<span class="cotMoeda nacional">
. Why don’t you look for her first? Otherwise, I think Regular Expression would make it easier https://tableless.com.br/o-basico-sobre-regular/– Leonardo Pessoa
Or better yet, take a look at this post talking about a
parser
own: https://answall.com/a/245947/57474– Leonardo Pessoa