-1
Good afternoon guys, I need to search for data from within a site (scraping) that are within a table, with and . The information I need is within td’s. Follow the code:
<table width="95%" class="infraTable" summary="Assuntos">
<tr>
<th class="infraTh">Evento</th>
<th class="infraTh">Data/Hora</th>
<th class="infraTh">Descrição</th>
<th class="infraTh">Usuário</th>
<th class="infraTh">Documentos</th>
</tr>
<tr class="infraTrClara">
<td>64</td>
<td>05/09/2019 15:16:15</td>
<td>Baixa Definitiva</td>
<td>140274</td>
<td> Evento não gerou documento(s)</td>
</tr>
<tr class="infraTrEscura">
<td>63</td>
<td>24/06/2019 21:27:34</td>
<td>Distribuído Retificação ou Suprimento ou Restauração de Registro Civil <BR>Número: 00259707220198272729</td>
<td>TO6615</td>
<td>Evento não gerou documento(s)</td>
</tr>
<tr class="infraTrClara">
<td>62</td>
<td>02/05/2019 14:55:03</td>
<td>Juntada - Outros documentos</td>
<td>356753</td>
<td>Evento não gerou documento(s)</td>
</tr>
However, when I try to search for the td class or the td’s elements, using Document.getElement.. i get several properties ( I’ll put only a few so the text doesn’t get too big) that are not in the text code:
document.getElementsByTagName("td")
HTMLCollection(349)
0: td
1: td
clientWidth: 83
colSpan: 1
contentEditable: "inherit"
dataset: DOMStringMap {}
dir: ""
draggable: false
elementTiming: ""
enterKeyHint: ""
firstChild: text
firstElementChild: null
headers: ""
height: ""
hidden: false
id: ""
innerHTML: "24/06/2019 21:27:34"
innerText: "24/06/2019 21:27:34"
inputMode: ""
isConnected: true
isContentEditable: false
lang: ""
lastChild: text
lastElementChild: null
localName: "td"
namespaceURI: "http://www.w3.org/1999/xhtml"
nextElementSibling: td
nextSibling: text
noWrap: false
nodeName: "TD"
nodeType: 1
nodeValue: null
nonce: ""
offsetHeight: 72
offsetLeft: 49
offsetParent: table.infraTable
offsetTop: 81
offsetWidth: 83
onabort: null
onauxclick: null
onbeforecopy: null
onbeforecut: null
onbeforepaste: null
onblur: null
oncancel: null
oncanplay: null
oncanplaythrough: null
onchange: null
onclick: null
onclose: null
oncontextmenu: null
oncopy: null
oncuechange: null
oncut: null
ondblclick: null
ondrag: null
ondragend: null
ondragenter: null
ondragleave: null
ondragover: null
ondragstart: null
ondrop: null
ondurationchange: null
onemptied: null
onended: null
onerror: null
onfocus: null
onformdata: null
onfullscreenchange: null
onfullscreenerror: null
ongotpointercapture: null
onselect: null
onselectionchange: null
onselectstart: null
onstalled: null
onsubmit: null
onsuspend: null
ontimeupdate: null
ontoggle: null
onvolumechange: null
onwaiting: null
onwebkitfullscreenchange: null
onwebkitfullscreenerror: null
onwheel: null
outerHTML: "<td>24/06/2019 21:27:34</td>"
outerText: "24/06/2019 21:27:34"
ownerDocument: document
parentElement: tr.infraTrEscura
parentNode: tr.infraTrEscura
part: DOMTokenList [value: ""]
prefix: null
2: td
3: td
(...)
The information I want, is in these properties, within "innerText"or "innerHTML"or "outerText" or "outerHTML". I can’t get that information at all. The site doesn’t use jquery, so it gets even more complicated. I wanted to do this scraping using only js, with Node. I need a little muscle.
You want to seek specific information or a specific line?
– adventistaam
How are you mouthing information?
– adventistaam
Load the Jquery script, and use the power of it if you are familiar with library functions.
– Ale
@adventistaam I can reach the tds in various ways, such as Array.from(Document.getElementsByTagNameNS("http://www.w3.org/1999/xhtml", "td")).
– arthurgehrke
@adventistaam I wanted this table information, as in the code attached, <td>64</td> <td>05/09/2019 15:16:15</td> <td>Final Low</td>
– arthurgehrke
@Alexandrec.Caus I would inject jquery into the page ? How would I do that ?
– arthurgehrke
If it helps, the page I’m trying to extract this data from td, é https://consultaeproc.tjto.jus.br/eprocV2_prod_1grau/externo_controlador.php?acao=processo_seleciona_publica&acao_origem=processo_consulta_publica&acao_retorno=processo_consulta_publica&num_processo=00227512720148272729&num_chave=&num_chave_documento=&hash=27998d644bc35a42b05905694b8d190b
– arthurgehrke
So you want specific information... With jquery it would really be easier and has several examples
– adventistaam
Just put the library before the javascript code. Jquery are a set of Javascript functions, which aim to facilitate and streamline system development. It is a portion of code, but minified, before, everything is fine. It may not be as efficient as using pure javascript, but it’s the fastest way to do what needs to be done.
– Ale
You can intect by calling the file jquery, here and has an example of using with jquery
– adventistaam
@Alexandrec.Caus then I would have to extract the code to insert jquery ? or you would have been able to do this injection without extracting the code, to insert the jquery Bib
– arthurgehrke
Why can’t you get this information at all? If you’ve found that the data you need is within the attribute
innerText
,innerHTML
, etc., just use this attribute to get the information you need. v.g.var dado_que_eu_preciso = document.getElementsByTagName("td")[i].innerText;
. Wherei
is the position oftd
that you need.– user142154