Filter information in an XML using C

Question

Filter information in an XML using C

Asked 7 years, 7 months ago

Viewed 242 times

0

I need to extract relevant information from a XML which has this structure:

<ARTIGO-PUBLICADO SEQUENCIA-PRODUCAO="2">
    <DADOS-BASICOS-DO-ARTIGO IDIOMA="Inglês" DOI="" FLAG-RELEVANCIA="NAO" HOME-PAGE-DO-TRABALHO="" MEIO-DE-DIVULGACAO="IMPRESSO" NATUREZA="COMPLETO" TITULO-DO-ARTIGO-INGLES="" PAIS-DE-PUBLICACAO="Suiça" ANO-DO-ARTIGO="1987" TITULO-DO-ARTIGO="How to get the best out of automated information systems."/>

    <DETALHAMENTO-DO-ARTIGO PAGINA-FINAL="434" PAGINA-INICIAL="432" SERIE="" FASCICULO="4" VOLUME="8" LOCAL-DE-PUBLICACAO="Organização Mundial da Saúde" ISSN="" TITULO-DO-PERIODICO-OU-REVISTA="World Health Forum"/>

    <AUTORES ORDEM-DE-AUTORIA="1" NOME-PARA-CITACAO="SABBATINI, R. M. E." NOME-COMPLETO-DO-AUTOR="Renato Marcos Endrizzi Sabbatini"/>

    <PALAVRAS-CHAVE PALAVRA-CHAVE-6="" PALAVRA-CHAVE-5="" PALAVRA-CHAVE-4="" PALAVRA-CHAVE-3="Avaliação de tecnologias" PALAVRA-CHAVE-2="Sistemas de informação em saúde" PALAVRA-CHAVE-1="Informática Médica"/>
    <AREAS-DO-CONHECIMENTO>
        <AREA-DO-CONHECIMENTO-1 NOME-DA-ESPECIALIDADE="" NOME-DA-SUB-AREA-DO-CONHECIMENTO="Sistemas de Computação" NOME-DA-AREA-DO-CONHECIMENTO="Ciência da Computação" NOME-GRANDE-AREA-DO-CONHECIMENTO="CIENCIAS_EXATAS_E_DA_TERRA"/>
        <AREA-DO-CONHECIMENTO-2 NOME-DA-ESPECIALIDADE="" NOME-DA-SUB-AREA-DO-CONHECIMENTO="" NOME-DA-AREA-DO-CONHECIMENTO="Medicina" NOME-GRANDE-AREA-DO-CONHECIMENTO="CIENCIAS_DA_SAUDE"/>
    </AREAS-DO-CONHECIMENTO>

    <SETORES-DE-ATIVIDADE SETOR-DE-ATIVIDADE-3="" SETOR-DE-ATIVIDADE-2="Informática" SETOR-DE-ATIVIDADE-1="Atividades de Banco de Dados"/>

    <INFORMACOES-ADICIONAIS DESCRICAO-INFORMACOES-ADICIONAIS-INGLES="" DESCRICAO-INFORMACOES-ADICIONAIS=""/>
</ARTIGO-PUBLICADO>

After the extraction, I need to send some of this data (NOME-DO-AUTOR, among others) to a file .csv. My main idea was to transform the XML in text to give search on file and treat it.

Is there a library that facilitates this work?

In this question related in Soen two names appear, Expat and libxml

– Isac

2017/12/02 at 00:41

1 answer

Browser other questions tagged c xml parser

You are not signed in. Login or sign up in order to post.

by Vintorisk • **364** points · Answer 1 · 2017-12-03T17:00:03+00:00

Voce can use to extract the information from the libxml library(http://www.xmlsoft.org/index.html)

to extract the author’s name:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>

void
getReference (xmlDocPtr doc, xmlNodePtr cur) {

    xmlChar *nome;
    cur = cur->xmlChildrenNode;
    while (cur != NULL) {
        if ((!xmlStrcmp(cur->name, (const xmlChar *)"AUTORES"))) {
            nome= xmlGetProp(cur, "NOME-COMPLETO-DO-AUTOR");
            printf("Nome do autor: %s\n", nome);
            xmlFree(nome);
        }
        cur = cur->next;
    }
    return;
}


void
parseDoc(char *docname) {

    xmlDocPtr doc;
    xmlNodePtr cur;

    doc = xmlParseFile(docname);

    if (doc == NULL ) {
        fprintf(stderr,"Document not parsed successfully. \n");
        return;
    }

    cur = xmlDocGetRootElement(doc);

    if (cur == NULL) {
        fprintf(stderr,"empty document\n");
        xmlFreeDoc(doc);
        return;
    }

    if (xmlStrcmp(cur->name, (const xmlChar *) "ARTIGO-PUBLICADO")) {
        fprintf(stderr,"document of the wrong type, root node != ARTIGO-PUBLICADO");
        xmlFreeDoc(doc);
        return;
    }

    getReference (doc, cur);
    xmlFreeDoc(doc);
    return;
}

int
main(int argc, char **argv) {

    char *docname;

    if (argc <= 1) {
        printf("Usage: %s docname\n", argv[0]);
        return(0);
    }

    docname = argv[1];
    parseDoc (docname);

    return (1);
}