How to create a Parser in Python?

Asked

Viewed 3,097 times

1

For a project with file handling, I wanted to convert from "CFG" to "XML", and I believe that in Python I have no support for either of these two types.

There was a colleague who made the modification manually, so we could study XML. Here are the respective CFG and XML structures:

CFG: http://pastebin.com/404dp2Ny

XML: http://pastebin.com/zfwFpqkT

Well I’m kind of "intermediate" in Python, but I’m having a problem in HOW I’m going to do this. I read the source of Configparser and I saw that he used the module re, but how will I use this module.

Anyway, do I have to invent a parser for both of you? If you can’t send me a code (of course you can’t), but you can tell me the following idea:?


I was thinking about doing that:

Identify the data types in the CFG, which is Section and what is Datum.
Create a kind of temporary dictionary with the data received.
To then rewrite the data into an XML file according to separate types. Creating an XML tree.

  • Configparser will already extract the data, now a doubt, the tags must follow exactly the same names of Keys in CFG? Or vc has a specific XML format?

  • @Guilhermenascimento Configparser does not understand this CFG. And XML is only slightly different from CFG. Some data is not required in XML.

1 answer

3


Probably the ConfigParser can be adapted to read the type of CFG which you are using. However, writing a parser is a great way to learn and practice Python. In my case, that’s how I started learning Python.

How to create a parser in Python?

It is possible to do this using pure Python, regardless of any module. Basically it is a manipulation exercise of string's. Surely there are more efficient ways to do the same, such as using the above re (module that handles regex, regular Expressions).

Continuing with pure Python, the general steps are:

  1. Identify the file format

    • What separates the parameter from the value?
    • Commentary
    • Groups
    • Particular cases
  2. Define a data structure to store the information in (example)

    • Create a dictionary where each key is a parameter, which holds a value
    • How many levels does it take? I need a dictionary of dictionaries?
    • Is order important? Dictionaries are not sorted. Alternatively, you can use a OrderedDict.
  3. Write file or convert

    • You need to write the file back?
    • In what format? Same or different? (in this case XML)

Let’s do it.

1. Identifying the file format

Seeing the sample file, we identify the following:

  • comments start with #
  • groups end up in :
  • parameters and values are separated by =
  • there are no group parameters, which appear first
  • we assume that all the following parameters belong to some group
  • Attention: on line 59 there is a comment that is not identified: Game Label Image; I put a #, but it is possible to deal with these situations in the code

2. Defining a data structure to store the information

As we have parameters separated by groups, in this example I decided by a dictionary-like structure of dictionaries. Example

cfg = {
    'grupo 1': {
        'param1': 1,
        'param2': 'abc'
    },
    'grupo 2': {
        'paramA': 'foo',
        'paramB': 123
    }
}

One of the groups will be the 'geral', which contains the parameters that appear first, outside any group.


3. Write file or convert

I will not cover this part in my reply, since the intended is to convert to XML, and this is beyond the main question (how to create the parser). I recommend reading the following topic: XML to/from a Python Dictionary.


Code

Below I present a simple implementation that reads a file according to the identified requirements. For more complex development, it will be advised to implement the functionalities in a Classand.

# coding=utf-8

# ficheiro cfg
caminho = '/Users/julio/Downloads/cfg.txt'
# inicializar estrutura de dados
parametros = {}
# criar grupo geral para parâmetros sem grupo
parametros['geral'] = {}
# marcador para saber qual o grupo actual
grupo = 'geral'

with open(caminho, 'r') as cfg:
    for linha in cfg.readlines():
        # apagar espaços em branco no início e fim
        linha = linha.strip()
        # a linha é um comentário ou está vazia, passar à próxima
        if not linha or linha.startswith('#'):
            continue
        # criar grupo
        if linha.endswith(':'):
            grupo = linha.split(':')[0]
            parametros[grupo] = {}
        # criar parametro
        else:
            param, valor = linha.split('=')
            # guardar parametro
            parametros[grupo][param] = valor

print parametros

Excerpt from the output:

{'info9': {'color': '#282828', 'attribute': 'Notes', 'width': '250',
           'y': '148', 'x': '290', 'aligned': '0', 'type': 'AttributeText',
           'display': '1'},
 'info8': {'color': '#282828', 'attribute': '#Size', 'y': '148', 'x': '36',
           'aligned': '0', 'type': 'AttributeText', 'display': '1'},
  • Looking like this seems simple, thank you very much... D

  • @Breno I hope you helped. It’s simple yes, complexity comes with the size of the problem and the type of requirements. : ) If this answers the question asked, consider marking as an accepted answer (see how on tour).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.