How I search properties within a td / tr using only js

Asked

Viewed 133 times

-1

Good afternoon guys, I need to search for data from within a site (scraping) that are within a table, with and . The information I need is within td’s. Follow the code:

 <table width="95%" class="infraTable" summary="Assuntos">
    <tr>
       <th class="infraTh">Evento</th>
       <th class="infraTh">Data/Hora</th>
       <th class="infraTh">Descrição</th>
       <th class="infraTh">Usuário</th>
       <th class="infraTh">Documentos</th>
    </tr>
    <tr class="infraTrClara">
        <td>64</td>
        <td>05/09/2019 15:16:15</td>
        <td>Baixa Definitiva</td>
        <td>140274</td>
        <td> Evento não gerou documento(s)</td>
    </tr>
    <tr class="infraTrEscura">
        <td>63</td>
        <td>24/06/2019 21:27:34</td>
        <td>Distribuído   Retificação ou Suprimento ou Restauração de Registro Civil <BR>Número: 00259707220198272729</td>
        <td>TO6615</td>
        <td>Evento não gerou documento(s)</td>
    </tr>
    <tr class="infraTrClara">
        <td>62</td>
        <td>02/05/2019 14:55:03</td>
        <td>Juntada - Outros documentos</td>
        <td>356753</td>
        <td>Evento não gerou documento(s)</td>
    </tr>

However, when I try to search for the td class or the td’s elements, using Document.getElement.. i get several properties ( I’ll put only a few so the text doesn’t get too big) that are not in the text code:

document.getElementsByTagName("td")
HTMLCollection(349)
0: td
1: td
clientWidth: 83
colSpan: 1
contentEditable: "inherit"
dataset: DOMStringMap {}
dir: ""
draggable: false
elementTiming: ""
enterKeyHint: ""
firstChild: text
firstElementChild: null
headers: ""
height: ""
hidden: false
id: ""
innerHTML: "24/06/2019 21:27:34"
innerText: "24/06/2019 21:27:34"
inputMode: ""
isConnected: true
isContentEditable: false
lang: ""
lastChild: text
lastElementChild: null
localName: "td"
namespaceURI: "http://www.w3.org/1999/xhtml"
nextElementSibling: td
nextSibling: text
noWrap: false
nodeName: "TD"
nodeType: 1
nodeValue: null
nonce: ""
offsetHeight: 72
offsetLeft: 49
offsetParent: table.infraTable
offsetTop: 81
offsetWidth: 83
onabort: null
onauxclick: null
onbeforecopy: null
onbeforecut: null
onbeforepaste: null
onblur: null
oncancel: null
oncanplay: null
oncanplaythrough: null
onchange: null
onclick: null
onclose: null
oncontextmenu: null
oncopy: null
oncuechange: null
oncut: null
ondblclick: null
ondrag: null
ondragend: null
ondragenter: null
ondragleave: null
ondragover: null
ondragstart: null
ondrop: null
ondurationchange: null
onemptied: null
onended: null
onerror: null
onfocus: null
onformdata: null
onfullscreenchange: null
onfullscreenerror: null
ongotpointercapture: null
onselect: null
onselectionchange: null
onselectstart: null
onstalled: null
onsubmit: null
onsuspend: null
ontimeupdate: null
ontoggle: null
onvolumechange: null
onwaiting: null
onwebkitfullscreenchange: null
onwebkitfullscreenerror: null
onwheel: null
outerHTML: "<td>24/06/2019 21:27:34</td>"
outerText: "24/06/2019 21:27:34"
ownerDocument: document
parentElement: tr.infraTrEscura
parentNode: tr.infraTrEscura
part: DOMTokenList [value: ""]
prefix: null
2: td
3: td
(...)

The information I want, is in these properties, within "innerText"or "innerHTML"or "outerText" or "outerHTML". I can’t get that information at all. The site doesn’t use jquery, so it gets even more complicated. I wanted to do this scraping using only js, with Node. I need a little muscle.

  • You want to seek specific information or a specific line?

  • How are you mouthing information?

  • Load the Jquery script, and use the power of it if you are familiar with library functions.

  • @adventistaam I can reach the tds in various ways, such as Array.from(Document.getElementsByTagNameNS("http://www.w3.org/1999/xhtml", "td")).

  • @adventistaam I wanted this table information, as in the code attached, <td>64</td> <td>05/09/2019 15:16:15</td> <td>Final Low</td>

  • @Alexandrec.Caus I would inject jquery into the page ? How would I do that ?

  • If it helps, the page I’m trying to extract this data from td, é https://consultaeproc.tjto.jus.br/eprocV2_prod_1grau/externo_controlador.php?acao=processo_seleciona_publica&acao_origem=processo_consulta_publica&acao_retorno=processo_consulta_publica&num_processo=00227512720148272729&num_chave=&num_chave_documento=&hash=27998d644bc35a42b05905694b8d190b

  • So you want specific information... With jquery it would really be easier and has several examples

  • Just put the library before the javascript code. Jquery are a set of Javascript functions, which aim to facilitate and streamline system development. It is a portion of code, but minified, before, everything is fine. It may not be as efficient as using pure javascript, but it’s the fastest way to do what needs to be done.

  • You can intect by calling the file jquery, here and has an example of using with jquery

  • @Alexandrec.Caus then I would have to extract the code to insert jquery ? or you would have been able to do this injection without extracting the code, to insert the jquery Bib

  • Why can’t you get this information at all? If you’ve found that the data you need is within the attribute innerText, innerHTML, etc., just use this attribute to get the information you need. v.g. var dado_que_eu_preciso = document.getElementsByTagName("td")[i].innerText;. Where i is the position of td that you need.

Show 7 more comments

2 answers

2

Hi! Take a look at this feature! I answered that today on another question, but I think it applies to what you’re looking for.

function extractTableData() {
console.log('Extraindo dados');
var myTab = document.querySelectorAll('#idTable1 tbody tr ');
var strResult='';
Array.from(myTab).forEach(input => {
  var tds = input.closest('tr').children;
  var obj = {};
  obj.A = tds[0].textContent;
  obj.B = tds[1].textContent;
  strResult=strResult+obj.A+',';  
});
console.log('Result=' + strResult);
}
<table width="95%" id="idTable1" class="infraTable" summary="Assuntos">
    <tr>
       <th class="infraTh">Evento</th>
       <th class="infraTh">Data/Hora</th>
       <th class="infraTh">Descrição</th>
       <th class="infraTh">Usuário</th>
       <th class="infraTh">Documentos</th>
    </tr>
    <tr class="infraTrClara">
        <td>64</td>
        <td>05/09/2019 15:16:15</td>
        <td>Baixa Definitiva</td>
        <td>140274</td>
        <td> Evento não gerou documento(s)</td>
    </tr>
    <tr class="infraTrEscura">
        <td>63</td>
        <td>24/06/2019 21:27:34</td>
        <td>Distribuído   Retificação ou Suprimento ou Restauração de Registro Civil <BR>Número: 00259707220198272729</td>
        <td>TO6615</td>
        <td>Evento não gerou documento(s)</td>
    </tr>
    <tr class="infraTrClara">
        <td>62</td>
        <td>02/05/2019 14:55:03</td>
        <td>Juntada - Outros documentos</td>
        <td>356753</td>
        <td>Evento não gerou documento(s)</td>
    </tr>
    
</table>    
   <button type="button" onclick="extractTableData()">Extrair</button> 

Similar question, that I spoke: How to extract values from a column and separate them with a comma

Success, Thanks!

  • Valleeu @Valmor. I tried here but returned Undefined

  • Note that I changed your table a little, put a name / id in table id="idTable1"

0

To read the contents of a td, you can use the property innerText:

var tabela = document.querySelector(".infraTable");
var tds = tabela.getElementsByTagName("td");

for (var i=0; i<tds.length; i++)
{
    console.log("Conteudo da coluna " + i + " é " + tds[i].innerText);
}
<table width="95%" class="infraTable" summary="Assuntos">
    <tr>
       <th class="infraTh">Evento</th>
       <th class="infraTh">Data/Hora</th>
       <th class="infraTh">Descrição</th>
       <th class="infraTh">Usuário</th>
       <th class="infraTh">Documentos</th>
    </tr>
    <tr class="infraTrClara">
        <td>64</td>
        <td>05/09/2019 15:16:15</td>
        <td>Baixa Definitiva</td>
        <td>140274</td>
        <td> Evento não gerou documento(s)</td>
    </tr>
    <tr class="infraTrEscura">
        <td>63</td>
        <td>24/06/2019 21:27:34</td>
        <td>Distribuído   Retificação ou Suprimento ou Restauração de Registro Civil <BR>Número: 00259707220198272729</td>
        <td>TO6615</td>
        <td>Evento não gerou documento(s)</td>
    </tr>
    <tr class="infraTrClara">
        <td>62</td>
        <td>02/05/2019 14:55:03</td>
        <td>Juntada - Outros documentos</td>
        <td>356753</td>
        <td>Evento não gerou documento(s)</td>
    </tr>
 </table>

Note that, first I selected the table, then I got all the Tds and went through one by one with the command for and using the innerText to pick up the content. It is possible to filter by line, by a specific Columa, just take advantage of this example.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.