Problems with xml to transform into data.frame in R

Asked

Viewed 133 times

1

Good night

I need an xml collaboration, I have little knowledge for R in this

I have the following xml structure

To help me, please copy to a txt and save in xml format to use

<arquivoposicao_4_01>
<fundo>
<titpublico>
 <dtemissao>20080509</dtemissao>
 <dtoperacao>20110614</dtoperacao>
 <dtvencimento>20140907</dtvencimento>
 <qtdisponivel>0</qtdisponivel>
 <qtgarantia>114</qtgarantia>
 <depgar>5</depgar>
 <caracteristica>N</caracteristica>
 <percprovcred>0</percprovcred>
 <classeoperacao>C</classeoperacao>
 <idinternoativo>227549    </idinternoativo>
 <nivelrsc></nivelrsc>
</titpublico>
</fundo>
</arquivoposicao_4_01>

When saving it, I import the data

dados<-xmlParse(file = choose.files())

And I try to turn it into data.frame, but the result is not as expected

dados2<-xmlToDataFrame(dados)

I would like to result in a data frame, where each line would be respective to each xml item, but the result is that all together , becoming impossible to separate

thank you in advance

1 answer

5


It is very rare for you to find an XML that is already in the right structure to turn it into a data.frame. Ideally you turn the file into a list and then extract from the list what you need. Assuming your XML file is called arquivo.XML then the following code will read XML and turn it into a list:

xml_arquivo <- xmlParse(file = 'arquivo.XML')
xml_lista <- xmlToList(xml_arquivo)

resulting in the following list:

$fundo
$fundo$titpublico
$fundo$titpublico$isin
[1] "BRSTNCLF1QR4"

$fundo$titpublico$codativo
[1] "210100"

$fundo$titpublico$cusip
[1] "STNCLF1QR"

$fundo$titpublico$dtemissao
[1] "20080509"

$fundo$titpublico$dtoperacao
[1] "20110614"

$fundo$titpublico$dtvencimento
[1] "20140907"

$fundo$titpublico$qtdisponivel
[1] "0"

$fundo$titpublico$qtgarantia
[1] "114"

$fundo$titpublico$depgar
[1] "5"

$fundo$titpublico$pucompra
[1] "4722.758614"

$fundo$titpublico$puvencimento
[1] "1"

$fundo$titpublico$puposicao
[1] "5481.16800311"

$fundo$titpublico$puemissao
[1] "1000"

$fundo$titpublico$principal
[1] "538394.48"

$fundo$titpublico$tributos
[1] "0"

$fundo$titpublico$valorfindisp
[1] "0"

$fundo$titpublico$valorfinemgar
[1] "624853.15"

$fundo$titpublico$coupom
[1] "0"

$fundo$titpublico$indexador
[1] "SEL"

$fundo$titpublico$percindex
[1] "100"

$fundo$titpublico$caracteristica
[1] "N"

$fundo$titpublico$percprovcred
[1] "0"

$fundo$titpublico$classeoperacao
[1] "C"

$fundo$titpublico$idinternoativo
[1] "227549    "

$fundo$titpublico$nivelrsc
NULL

and ready! Say you want to access the expiration date of the title. Just do this:

xml_lista$fundo$titpublico$dtvencimento
[1] "20140907"

With the function as.Date() you even get a much better result

as.Date(xml_lista$fundo$titpublico$dtvencimento, format = "%Y%m%d")
[1] "2014-09-07"

Finally you can turn the list into a data.frame if you want to:

dados = as.data.frame(unlist(xml_lista$fundo$titpublico))
  • thanks , helped a lot , I will try to give scale to the complete file

  • @Henrique Faria de Oliveira, I have a problem similar to yours, but I’m not getting an answer. https://answall.com/questions/426692/transformar-xml-em-dataframe

Browser other questions tagged

You are not signed in. Login or sign up in order to post.