C# update XML based on another XML

Asked

Viewed 1,073 times

4

Today I have the following XML structure:

<ROOT>
    <TES IDTES="4780" IDPES="17522" />
    <TES IDTES="6934" IDPES="12343" />
    <TES IDTES="4781" IDPES="17523" />
    <TES IDTES="6935" IDPES="12344" />
</ROOT>

To update this XML I have the following:

<ROOT>
    <TES DEL="S" IDTES="4780" IDPES="17522" />
    <TES DEL="S" IDTES="6934" IDPES="12343" />
    <TES IDTES="7777" IDPES="17523" />
    <TES IDTES="2020" IDPES="12344" />
</ROOT>

It means I have to delete 2 TES tags with their respective IDTES and add 2 more TES tags. Resulting in:

<ROOT>
    <TES IDTES="4781" IDPES="17523" />
    <TES IDTES="6935" IDPES="12344" />
    <TES IDTES="7777" IDPES="17523" />
    <TES IDTES="2020" IDPES="12344" />
</ROOT>

I did some research on Diff and Merge between Xmls in C# but it didn’t help me much.

How to do this with LINQ without looping?

  • Can be in LINQ using some loopings?

  • Hello again Gypsy :) I simplified XML to facilitate understanding. But imagine a 17MB base file that I should update with a few 3MB more (lines to delete and add). Loopings could get slow. Don’t you have a way to pass something like Xmldebase.Delete.Select[IDTES in [array de ids]]? Forcei?

  • I see no other way but loading these files in memory and handling.

  • A different file formatting would help?

  • I include a reply using XSLT (XslTransform or XslCompiledTransform) as an alternative to LINQ. I believe it can be more efficient (it can consume more memory, but it should be faster).

3 answers

4

An alternative to using LINQ is to use XSLT transformation, which performs transformations into XML nodes using compiled templates. XSLT transformations use DOM and load XML into memory, but nodes are selected with Xpath, which tends to be more efficient.

The downside is that XSLT is another language (and it’s not as trivial as it looks at first glance). I will describe what a solution to your XSLT problem would be (which you can run with C#). If the structure of your original documents is similar to the one you presented as an example, you may not even need to change the code and use it without changes.

A brief overview of XSLT operation

The XSLT transformer gets a source document (Well-formed XML) and a XSL document (XML in XSLT language) and produces a text output (can be XML, text, XML fragment, etc.) The XSL document can also read additional sources (files) that are loaded through a function used in Xpath expressions (document('caminho-do-arquivo')). In your case, the file containing the overwrites would be loaded this way. The transformer also accepts that data is passed as a parameter at the time of execution. This data is passed to an element <xsl:param>in the XSL document. You can run the transformer in various ways. There are online services, command line tools (such as Saxon, Xalan) and also Apis in C#, Java, PHP, Ruby, etc.

Troubleshooting your problem using C# and XSLT

I’ll call the original file from fonte.xml:

<ROOT>
    <TES IDTES="4780" IDPES="17522" />
    <TES IDTES="6934" IDPES="12343" />
    <TES IDTES="4781" IDPES="17523" />
    <TES IDTES="6935" IDPES="12344" />
</ROOT>  

And the file with the replacements of atualizacao.xml:

<ROOT>
    <TES DEL="S" IDTES="4780" IDPES="17522" />
    <TES DEL="S" IDTES="6934" IDPES="12343" />
    <TES IDTES="7777" IDPES="17523" />
    <TES IDTES="2020" IDPES="12344" />
</ROOT>

The XSLT document I’ll call atualiza.xsl does the transformation you need. If you run an XSL transformer and pass fonte.xmlas input, atualizacao.xml as the parameter I called arquivo, and atualiza.xsl as the XSL file, it will generate this result:

<ROOT>
   <TES IDTES="4781" IDPES="17523"/>
   <TES IDTES="6935" IDPES="12344"/>
   <TES IDTES="7777" IDPES="17523"/>
   <TES IDTES="2020" IDPES="12344"/>
</ROOT>

The C# code to run the XSLT transformer is similar to the code below (I haven’t tested it - and I’m not a C# programmer - so there might be some inaccuracy):

        XslCompiledTransform transform = new XslCompiledTransform(true);

        XsltArgumentList par = new XsltArgumentList();
        par.AddParam("arquivo", "", "atualizacao.xml");

        XsltSettings s = new XsltSettings();
        s.EnableDocumentFunction = true;

        transform.Load("atualiza.xslt",s, new XmlUrlResolver());

        using (StreamWriter stream = new StreamWriter("resultado.xml")) 
        {
            transform.Transform("fonte.xml", par, stream);
        }

The XSLT document is listed below:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:output indent="yes"/>

    <xsl:param name="arquivo">atualizacao.xml</xsl:param>
    <xsl:variable name="doc" select="document($arquivo)" />

    <xsl:template match="ROOT">
        <xsl:copy>
            <xsl:apply-templates select="TES[not($doc/ROOT/TES/@IDTES=@IDTES and $doc/ROOT/TES/@IDPES=@IDPES and $doc/ROOT/TES/@DEL='S')]"/>
            <xsl:apply-templates select="$doc/ROOT/TES[not(@DEL = 'S')]"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="TES">
        <xsl:copy-of select="."/>
    </xsl:template>

</xsl:stylesheet>

The first element within <xsl:stylesheet> is

    <xsl:output indent="yes"/>

which generates a demonized result. You can remove it if you wish. The following element:

    <xsl:param name="arquivo">atualizacao.xml</xsl:param>

takes the parameter arquivo that you pass via C#. If you do not pass the parameter for some reason it will use as default value the name atualizacao.xml.

The following element

<xsl:variable name="doc" select="document($arquivo)" />

loads the document and if you find it assigns to a constant doc (that you can use through the document as $doc).

The document contains two templates <xsl:template> where transformations occur. The second template:

<xsl:template match="TES">
    <xsl:copy-of select="."/>
</xsl:template>

simply copies the entire node with attributes and content. It is only called when an element <TES> is being processed (it makes no restriction to where that node is located, in the source file or the other).

The first template matches the node ROOT. Will be the <ROOT>of fonte.xml and will be called automatically. The element <copy> copy this node (will produce <ROOT>...</ROOT>). Inside the node there are two calls xsl:apply-templates that contains Xpath expressions. They will choose what will be placed inside <ROOT>.

The first Xpath:

TES[not($doc/ROOT/TES/@IDTES=@IDTES and $doc/ROOT/TES/@IDPES=@IDPES and $doc/ROOT/TES/@DEL='S')]

is relative to <ROOT> (refers to the document fonte.xml) and selects all elements <TES> except for those who have @IDTES and @IDPES equal the corresponding attributes of a TES of the document atualizacao.xml ($doc/ROOT/TES) which also has an attribute DEL='S' ($doc/ROOT/TES/@DEL='S'). This way it passes through all elements and does not copy to the source tree those that must be removed.

The second Xpath

$doc/ROOT/TES[not(@DEL = 'S')]

acts only on the document atualizacao.xml ($doc), copying to the result tree only the nodes that have not attribute DEL='S'.

Information about C transformation classes XSLT#:

More information about XSLT

  • XSLT is a great alternative. In addition to being easily testable in modern browsers.

  • Its ALIVE!!! For the example given this was the solution. I will still test with some large files I have. Vlw.

  • 1

    I considered the answer of the iuri more correct because of performance. More details in my reply.

2


Using LINQ with XDocument:

XDocument doc1 = XDocument.Parse(@"
<ROOT>
    <TES IDTES=""4780"" IDPES=""17522"" />
    <TES IDTES=""6934"" IDPES=""12343"" />
    <TES IDTES=""4781"" IDPES=""17523"" />
    <TES IDTES=""6935"" IDPES=""12344"" />
</ROOT>");

XDocument doc2 = XDocument.Parse(@"
<ROOT>
    <TES DEL=""S"" IDTES=""4780"" IDPES=""17522"" />
    <TES DEL=""S"" IDTES=""6934"" IDPES=""12343"" />
    <TES IDTES=""7777"" IDPES=""17523"" />
    <TES IDTES=""2020"" IDPES=""12344"" />
</ROOT>");

In this example I am using literal strings to create the objects, of course you should open the XML files using Load():

XDocument doc1 = XDocument.Load("file.xml");

The idea would be to merge the 2 files while converting to a simpler object list:

var list = doc1.Element("ROOT").Elements().Select(m => new { 
        IDTES = (string)m.Attribute("IDTES"), 
        IDPES = (string)m.Attribute("IDPES"), 
        DEL = (string)m.Attribute("DEL") ?? "N" } // coalesce para "N" em caso de null 
    ).Union(doc2.Element("ROOT").Elements().Select(m => new { 
        IDTES = (string)m.Attribute("IDTES"), 
        IDPES = (string)m.Attribute("IDPES"), 
        DEL = (string)m.Attribute("DEL") ?? "N" }
    )
);

Filter this list with Where to get the lines to exclude and apply Except to produce the desired result:

var toDel = list.Where(m => m.DEL == "S").Select(m => new { m.IDTES, m.IDPES });
var result = list.Select(m => new { m.IDTES, m.IDPES }).Except(toDel);

So just generate a new one XDocument from the result:

var doc3 = new XDocument(new XElement("ROOT",
           from r in result
           select new XElement("TES",
               new XAttribute("IDTES", r.IDTES),
               new XAttribute("IDPES", r.IDPES)
           )
      )
);

And burn to disc with Save():

doc3.Save("file.xml");
  • What would this example look like if I wanted to load XML from a file and not a string? Vlw.

  • XDocument doc1 = XDocument.Load("file.xml");

1

After a few tests, I got the following results:

For a 53MB XML base and 45KB update XML

  • Using the solution with Xslcompiledtransform takes 5 min. to generate the new file
  • Using the Xdocument solution takes 13 seconds to generate the new file

For a 45KB XML base and 53MB update XML

  • Using the solution with Xslcompiledtransform takes 16 min. to generate the new file
  • Using the Xdocument solution takes 13 seconds to generate the new file

For both 53MB Xmls

  • Using the solution with Xslcompiledtransform took more than 1 hour and canceled
  • Using the Xdocument solution takes 20 seconds to generate the new file

In this way I changed the correct answer as the one of IURI, since in my case the project proved viable thanks to this solution.

  • Interesting this benchmark. You also measured memory consumption?

  • It would be interesting to compare these in the form of an article. Including environment data (platform, memory, etc.) You could post on blog, portal, on Infoq.

  • With both Xmls with 53MB in memory usage and CPU were similar. When the processing time is shorter (~10 sec.), Xslcompiledtransform uses almost 2x LESS than Xdocument. In my case, the most important thing is time and no recourse. You can even make a logic that depending on the size of the XML and time the process is being executed you can opt for Speed or Resource Use.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.