Identify zero sequences in a csv file using python

Asked

Viewed 162 times

0

good morning!

I’m having trouble identifying sequences of zero in a csv file, using Pyhton.

Below is an example of my csv file:

Index       |  Ponto de Medição  |     Data    | Hora  |ZeroApply
  0         |        User1       |  31/12/2018 |  20   |   1
  1         |        User1       |  31/12/2018 |  21   |   0
  2         |        User1       |  31/12/2018 |  22   |   0
  3         |        User1       |  31/12/2018 |  23   |   0
  4         |        User1       |  31/12/2018 |  24   |   0
  5         |        User2       |  01/01/2019 |  01   |   1
  6         |        User2       |  01/01/2019 |  02   |   0
  7         |        User2       |  01/01/2019 |  03   |   0
  8         |        User2       |  01/01/2019 |  04   |   0
  9         |        User3       |  01/01/2019 |  05   |   1

I need to generate an alarm whenever I identify in the "Zeroapply" column a sequence of zero >= 3., and indicate the user who has this sequence of zero, as the example below, insert a "Check" in the last record of zero of the sequence.

Index       |  Ponto de Medição  |     Data    | Hora  |ZeroApply |Análise
  0         |        User1       |  31/12/2018 |  20   |   1      |
  1         |        User1       |  31/12/2018 |  21   |   0      |
  2         |        User1       |  31/12/2018 |  22   |   0      |
  3         |        User1       |  31/12/2018 |  23   |   0      |
  4         |        User1       |  31/12/2018 |  24   |   0      |Verificar
  5         |        User2       |  01/01/2019 |  01   |   1      |
  6         |        User2       |  01/01/2019 |  02   |   0      |
  7         |        User2       |  01/01/2019 |  03   |   0      |
  8         |        User2       |  01/01/2019 |  04   |   0      |Verificar
  9         |        User3       |  01/01/2019 |  05   |   1      |

Can someone help me in how to do this?

Thanks in advance!

  • You can put the csv link?

  • Hello Felipe. The problem is clear. But, what have you tried to do? What is your difficulty? You have a better chance of getting help if you explain your difficulty and don’t expect someone to just do it for you.

2 answers

1

The first thing to keep in mind is that: a CSV file is not "modifiable" - your program will have to generate another CSV file, with the same template, and the most entered data.

In your example it is easy to understand why you cannot modify the file: the data lines have 60 characters - which correspond to 60 bytes in the file (may correspond to a little more bytes depending on the encoding and the presence of special characters, but it is not the case now). When adding a fourth column, the row size is 75 characters/bytes - it turns out that in the original file, these 15 extra bytes are already in the next line, So this next line, including the line markers, would be over-written by the extra data. Two lines would merge into one, part of the data would be lost, etc...

On the other hand, creating a new file is very smooth and efficient - you can use CSV files peacefully in operations with up to hundreds of thousands of records, as long as they are not done frequently (e.g. once per request in a web application). Of course from a certain point it is better to keep everything in database.

What is not so direct, from Python, is to generate the CSV file as "cute" as those of the examples, with the data centered on the columns, etc... but it is possible also (a tip in this case is to use the library Rows, and generate "text tables" and not in CSV).

This constraint on formatting is easily manageable as well, if it is part of the requirements - simply format each field with the string functions when writing the file.

Reading and writing the CSV file as direct text is very easy in Python - but it is interesting to note the module csv that can handle "corner-cases" automatically: if the separator character appears inside a text cell, or if some cell contains the character ", normally used to delimit text, or even the word wrap character \n.

The algorithm itself, to detect the zeros then is trivial - just keep a counter, zeroed at each occurrence of a non-zero value at the desired position, and incremented at each new line.

As it is easier to understand, I will give example of a version opening the files directly, without the module csv python. But in production it is interesting to look at the same, especially if at any point in the future the archives may have some of the "pathogenic" cases I quoted in their lines.

from pathlib import Path

def main(filepath):
    inp_path = Path(filepath)
    out_path = Path(filepath + "_new")

    with inp_path.open() as inp_file, out_path.open("wt") as out_file:
        headers = next(inp_file).strip().split('|')
        new_headers = headers[:] + 'Análise'
        # O seu arquivo de exemplo tem os títulos das colunas todos bagunçados: 
        # em alguns casos é centrado com um espaço de folga, a data
        # é centrada com vários espaços de folga, algumas colunas são alinhadas
        # a esquerda, outras centradas, e outras com alinhadas a esquerda com 
        # espaços.  Como não é possível colocar isso no código sem as coisas
        # específicas de cada coluna, pode valer a pena simplesmetne 
        # escrever o cabeçalho todo como uma string fixa. Isso é bem pouco elegante,
        # mas seria:
        # out_file.write("Index       |  Ponto de Medição  |     Data    | Hora  |ZeroApply |Análise\n")
        # melhor é abrir mão da formatação maluca para cada coluna, e usar, por exemplo,
        # nomes centralizados em 12 espaços (o que permite manter a formatação da data como está)
        out_file.write("|".join(header.strip().center(12) for header in headers) + "\n")
        zero_sequence_counter = 0
        for line in inp_file:
            fields = line.split("|")
            zero_apply = int(fields[-1].strip())
            verify_field = ""
            if zero_apply == 0:
                zero_sequence_counter += 1
                if zero_sequence_counter >= 3:
                    verify_field = "Verificar"
            else:
                zero_sequence_counter = 0
            fields.append(verify_field)
            out_file.write("|".join(field.strip().center(12) for field in fields) + \n)

    # opcional: remover o arquivo original e renomear o novo arquivo:
    inp_path.unlink()
    out_path.rename(inp_path)


main("nome_do_arquivo.csv")

One detail of this example is that I make extensive use of the class pathlib.Path for all file-related operations. As it is a relatively recent feature in the language, almost every example you find of Python will manipulate arches without using it - using string manipulation functions for the filename, and the open built-in to open a file.

Other than that, what you have there is the use of methods .split, .center, strip and .join of strings, and + to concatenate strings - and int to make use of the column as a numerical value - but nor would this conversion be necessary. The algorithm itself, as described above, is quite simple.

-1

you can create an empty dictionary or already with all users, and all pointing to the zero value

when you iterate the file and if the value is zero, vc will add 1 to the corresponding user in the dictionary, when it reaches 3 you send the warning, or vc Zera the value in the dictionary if the sequence is broken

Browser other questions tagged

You are not signed in. Login or sign up in order to post.