Index Google Shopping standard XML in Sphinx Search

Asked

Viewed 64 times

0

How to Index a Standard XML Google Shopping in Sphinx Search?

XML:

<?xml version="1.0"?>
<rss version="2.0" 
xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title>O nome do seu feed de dados</title>
<link>http://www.example.com</link>
<description>Uma descrição do seu conteúdo</description>
<item>
<title>Suéter de lã vermelho</title>
<link> http://www.example.com/item1-info-page.html</link>
<description>Confortável e macio, este suéter o manterá aquecido nas noites frias de inverno.</description>
<g:image_link>http://www.example.com/imagem1.jpg</g:image_link>
<g:price>25</g:price>
<g:condition>new</g:condition>
<g:id>1a</g:id>
</item>
<!-- ... -->

The Sphinx index will use xmlpipe2 data source.

I will need to convert XML to default xmlpipe2 document before indexing it?

Format xmlpipe2 document:

<?xml version="1.0" encoding="utf-8"?>
<sphinx:docset>

<sphinx:schema>
<sphinx:field name="subject"/>
<sphinx:field name="content"/>
<sphinx:attr name="published" type="timestamp"/>
<sphinx:attr name="author_id" type="int" bits="16" default="1"/>
</sphinx:schema>

<sphinx:document id="1234">
<content>this is the main content <![CDATA[[and this <cdata> entry
must be handled properly by xml parser lib]]></content>
<published>1012325463</published>
<subject>note how field/attr tags can be
in <b class="red">randomized</b> order</subject>
<misc>some undeclared element</misc>
</sphinx:document>

<!-- ... -->

1 answer

0


Use the Pipe2:

./pipe2.phar convert:google data/google-shopping-sample.xml

Use in Sphinx configuration:

source xmlSource
{
    type = xmlpipe
    xmlpipe_command = /usr/local/bin/pipe2 convert:google /tmp/google-shopping-sample.xml
}

Browser other questions tagged

You are not signed in. Login or sign up in order to post.