Remove two values from a text file

Asked

Viewed 35 times

2

I need to remove two numbers from a text file, and add them to two variables (X and Y). I’m trying to use the CSV but I’m not getting it.

with open('circulo.plt','r') as csvfile:
    plots = csv.reader(csvfile, delimiter=' ')

Here is an example of the table:

PW0.350,5;
PW0.350,6;
PW0.350,7;
PW0.350,8;
LT;
SP1;
PU-2158 393;
PD-1358 393;
PD-1358 1193;
PD-2158 1193;
PD-2158 393;
SP0;

The only values that interest me are those that start with "PU-" and "PD-".

  • Try to show us your code. What did you try to do? How did you do?

1 answer

1

The method csv.reader returns a Reader which you should use to iterate through the lines of the file. For each line, a list containing the respective fields is returned. How you used the delimiter=' ', fields will be separated by spaces. Then just see if the first element of the list starts with "PU-" or "PD-", using the method startswith:

import csv
with open('circulo.plt','r') as csvfile:
    plots = csv.reader(csvfile, delimiter=' ')
    for linha in plots:
        if linha[0].startswith('PU-') or linha[0].startswith('PD-'):
            print(linha)

In case, I am printing the entire line. As each row is returned as a list, the output is:

['PU-2158', '393;']
['PD-1358', '393;']
['PD-1358', '1193;']
['PD-2158', '1193;']
['PD-2158', '393;']

Then you decide whether to use lista[0] to take only the first element (the strings starting with "PU-" or "PD-"), or lista[1] to take its value ("393;", "1193;", etc).


But maybe you don’t even need to use the module csv, because you can read the lines of the file like this:

with open('/tmp/arq.txt','r') as arquivo:
    for linha in arquivo:
        if linha.startswith('PU-') or linha.startswith('PD-'):
            print(linha.split())

The difference is that now each row is a string containing the entire row, so I use split to separate the string by spaces and return a list. The output is the same as in the previous example.

In the same way as the previous solution, you can choose whether to take the first or second element of the list returned by split.


If you want, you can also use regular expressions, through the module re:

import re

r = re.compile(r'^P[UD]-')
with open('/tmp/arq.txt','r') as arquivo:
    for linha in arquivo:
        if r.match(linha):
            print(linha.split())

For each line in the file, it is checked whether it corresponds to regex ^P[UD]-.

The bookmark ^ means "string start", so I guarantee I will test only if the string starts with a certain pattern.

Next we have the words P capital letters and then a character class (square brackets). In this case, [UD] means "the letter U or the letter D". Then we have the hyphen.

Therefore, the regex tests whether the line starts with PU- or PD-.


Another option, to test the regex and at the same time take the values is:

import re

r = re.compile(r'^(P[UD]-\S+) (\S+)')
with open('/tmp/arq.txt','r') as arquivo:
    for linha in arquivo:
        m = r.match(linha)
        if m:
            print(m.group(1), m.group(2))

Now I’m using parentheses to form capture groups - this allows me to take the values that regex found. I also use the shortcut \S, meaning "any character other than space" (see documentation to know all characters that are considered "spaces" - this includes the TAB and line breaks, for example).

Then I use group(1) and group(2) to obtain the values obtained by each pair of parentheses. The first group is the string that starts with "PD-" or "PU-", and the second group is the string that comes next in the file. The output, in this case, are two different strings instead of a list:

PU-2158 393;
PD-1358 393;
PD-1358 1193;
PD-2158 1193;
PD-2158 393;
  • 1

    Excellent explanation. I was trying to use regex but got lost in the middle.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.