Transform XML into Dataframe

Asked

Viewed 108 times

0

I have an XML and I’m trying to turn it into a DF. My XML:

<?xml version="1.0" encoding="ISO-8859-1" ?>


<test:TASS xmlns="http://www.vvv.com/schemas"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://www.vvv.com/schemas http://www.vvv.com/schemas/testV2_02_03.xsd"  xmlns:test="http://www.vvv.com/schemas" >
    <test:house>
                <test:billing>
                    <test:proceduresummary>
                        <test:guidenumber>X2030</test:guidenumber>
                            <test:diagnosis>
                                <test:table>ICD-10</test:table>
                                <test:diagnosiscod>J441</test:diagnosiscod>
                                <test:description>CHRONIC OBSTRUCTIVE PULMONARY DISEASE WITH (ACUTE) EXACERBATION</test:description>
                            </test:diagnosis>
                            <test:procedure>
                                <test:procedure>
                                    <test:description>HOSPITAL</test:description>
                                </test:procedure>
                                <test:amount>12</test:amount>
                            </test:procedure>
                    </test:proceduresummary>
                </test:billing>
                    <test:billing>
                    <test:proceduresummary>
                        <test:guidenumber>Y6055</test:guidenumber>
                            <test:diagnosis>
                                <test:table>ICD-10</test:table>
                                <test:diagnosiscod>I21</test:diagnosiscod>
                                <test:description>ACUTE MYOCARDIAL INFARCTION</test:description>
                            </test:diagnosis>
                            <test:procedure>
                                <test:procedure>
                                    <test:description>HOSPITAL</test:description>
                                </test:procedure>
                                <test:amount>8</test:amount>
                            </test:procedure>
                    </test:proceduresummary>
                </test:billing>
                    <test:billing>
                    <test:proceduresummary>
                        <test:guidenumber>Z9088</test:guidenumber>
                            <test:diagnosis>
                                <test:table>ICD-10</test:table>
                                <test:diagnosiscod>F20</test:diagnosiscod>
                                <test:description>SCHIZOPHRENIA</test:description>
                            </test:diagnosis>
                            <test:procedure>
                                <test:procedure>
                                    <test:description>HOSPITAL</test:description>
                                </test:procedure>
                                <test:amount>1</test:amount>
                            </test:procedure>
                    </test:proceduresummary>
                </test:billing>
    </test:house>
</test:TASS>

For every root called guidenumber I would like to extract the information from : diagnosiscod and description. And so turn into one DF, as seen below:

guidenumber <- c('X2030','Y6055','Z9088')
diagnosiscod <- c('J441','I21','F20')
description <- c('CHRONIC OBSTRUCTIVE PULMONARY DISEASE WITH (ACUTE) EXACERBATION','ACUTE MYOCARDIAL INFARCTION','SCHIZOPHRENIA')
df<- data.frame(guidenumber,diagnosiscod,description)

I tried code below, but it’s coming blank. I used this help (https://stackoverflow.com/questions/57875654/meteorological-data-from-xml-to-dataframe-in-r):

require(tidyverse)
require(xml2)
setwd("D:/")
myxml<- read_xml("base.xml")
house <- myxml %>% xml_find_all("//house")

How could I solve this problem and turn the XML in DF??

1 answer

1


You can try it that way:

library(XML)
doc<-xmlParse("base.xml")
nodes = getNodeSet(doc, "//test:house//test:billing//test:proceduresummary")
df=xmlToDataFrame(nodes=nodes ,stringsAsFactors = F)
df$amount=gsub("HOSPITAL","",df$procedure)

# > df
#   guidenumber                                                                 diagnosis  procedure amount
#1       X2030 ICD-10J441CHRONIC OBSTRUCTIVE PULMONARY DISEASE WITH (ACUTE) EXACERBATION HOSPITAL12     12
#2       Y6055                                      ICD-10I21ACUTE MYOCARDIAL INFARCTION  HOSPITAL8      8
#3       Z9088                                                    ICD-10F20SCHIZOPHRENIA  HOSPITAL1      1

Browser other questions tagged

You are not signed in. Login or sign up in order to post.