0
I have a text file of this kind:
Olá podes dizer-me quando o 1 passa aqui? ele passa, quando passar o Carlos Alberto.
I need to in python
Remove special characters such as punctuation, numbers, uppercase letters, make the accented characters normal characters and separate each character individually. Something like this:
o, l, a, p, o, d, e, s, d, i, z, e, r, m, e, q, u, a, n, d, o, o, p, a, s, s, a a, q, u, i, e, l, e, p, a, s, s, a, q, u, a, n, d, o, p, a, s, s, a, r, o, a, a, r, l, o, s, a, l, b, e, r, t, o
Is there any split
or with the use of import re
do all this?
I have it:
#letra minuscula
data = ''.join(data).lower()
#tirar os nuneros
data = re.sub('#\d{3}\/\d{3}', '', data)
nfkd = unicodedata.normalize('NFKD', data)
dataNova = u"".join([c for c in nfkd if not unicodedata.combining(c)])
dataNovaNova = re.sub('[^a-zA-Z0-9 \\\]', '', dataNova)
lista= []
lista = list(dataNovaNova)
Where date is a string