Txt Manipulation - Separating blocks based on a pattern

Asked

Viewed 172 times

1

I have a txt with some information. txt follows the following pattern:

1 - Beginning of the block

2 - Information

3 - Description of line 2

So for example

190845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
120845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN

What I need is to separate the blocks into different variables knowing that they start at 1 and end at the next incidence of 1. The example above would be:

a = '190845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN'

b = '120845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
22343 34234234 324234 324234234 234234 342324989856475959596    
3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN'

I tried to do it by matching while + readline + startswith, but I couldn’t

  • Can appear the value 1 anywhere on the line, inside the information, or always 1 will represent the beginning of the block?

  • Hello Anderson! How long ! So, 1 can appear anywhere in the file, but the 1 that represents the beginning of the block will always be at position 1 of the line

  • i had tried line.startswith("1"), but to manipulate the whole block became unviable

  • Thus what did you do? I think with regular expression you can make a better solution, but I can not think of anything at this time.

  • 1

    @Andersoncarloswoss your tip helped a lot !!

1 answer

0

You can use the following regular expression:

^1 - Where the ^ indicates the beginning of a line followed by the character you are looking for in case the 1. I made an example that I split the text using regular expression and after that I concatenate each item with 1.

To mount the regular expression like to use the Blush

Code: repl

import sys
import re

FINDER = "1"
# [(^1)]
# http://rubular.com/r/Sx8PL2qdR8
REGEX = '[(^' + FINDER + ')]'

if __name__ == '__main__':
    content = """190845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
    22343 34234234 324234 324234234 234234 342324989856475959596    
    3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
    22343 34234234 324234 324234234 234234 342324989856475959596    
    3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
    22343 34234234 324234 324234234 234234 342324989856475959596    
    3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN
    120845 3890580235203895 0329045832854880328 58908349058340534859 hjdfhgjdfhg dgfdgdf  
    22343 34234234 324234 324234234 234234 342324989856475959596    
    3SHDSHFUHDSFUHSDUFHSHDFUDSFDSTTJKKHGHJMNMNBN"""

    splitted = re.split(REGEX, content)

    #Verifica e faz a remoção do primeiro item se for vazio
    if splitted[0] == '':
        splitted = splitted[1:]

    #Recupera todos os item juntando com o caracter inicial
    result = []
    for split in splitted:
        result.append(FINDER + split)

    #Exibe o resultado
    print(len(result))
    for r in result:
        print("----> " + r)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.