How to separate a string by passing more than one parameter

Asked

Viewed 110 times

0

I have the following text file, except as texto.txt:

Vamos testar nesse arquivo, aqui.  
Temos que pedir para, que separe "todos os caracteres"!  
Fazer, a contagem de letras?  
Fazer a contagem de cada letra, que aparece aqui.  
Fazer a contagem de linhas.  
Esse texto contem 6 linhas.

Text has several symbols just to test string separation.

I wrote the following code to read the file:

with open('texto.txt', encoding='utf8') as arquivo:
    letras = arquivo.read()
    lista = letras.split(' ') 
    print(lista)

The read() reads the entire file and saves it as a string in letters.
It turns out that when asking to display letters, the answer is:

['Temos', 'que', 'pedir', 'para,', 'que', 'separe', '"todos', 'os', 'caracteres"!\nFazer,', 'a','contagem', 'de', 'letras?\nFazer', 'a', 'contagem', 'de', 'cada', 'letra,', 'que', 'aparece','aqui.\nFazer', 'a', 'contagem', 'de', 'linhas.\nEsse', 'texto', 'contem', '6', 'linhas.']

Note that it is only separated where there is empty space.
How do I make him part too when I find ,.!?, empty spaces and when you also find the \n?

2 answers

1


Hello, all right?

To solve your problem we will use some resources of python itself, the native re library, functions and loops, all this so that we solve the problem by following a Pattern design.

the code went like this:

import os
import re

def lerArquivo(caminho):
    with open(caminho, encoding='utf8') as arquivo:
        dictionaryInput = arquivo.read() 
        return dictionaryInput

def separar(lista):
    splitResult = re.split('([^a-zA-Z0-9])', lista)
    finalListResult = []
    for x in splitResult:
        if x != '\n' and x != '' and x != ' ':
            finalListResult.append(x)
    return finalListResult


caminho = r'.\texto.txt'
lerCaminho = lerArquivo(caminho)
print(separar(lerCaminho))

Any questions that remain about the code and its functions just ask.

0

If there is no need to preserve these "special" characters/sequences, you can just make one replace of all these sequences known to an empty space. Thus, only the split(' ') will be enough to break the text as desired.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.