How to remove spaces from a string in Python without also removing line breaks?

Asked

Viewed 1,145 times

4

I need to remove unnecessary spaces from the strings, but not the break between the lines. I have used the command below, which removes the spaces, but also removes the break with the /n. Does anyone know how to solve?

" ".join("   minha  \n string do python    ".split())

Console result:

'minha string do python'

Result I’m trying to make:

'minha \n string do python'

3 answers

5


If you want to replace two or more spaces by just one, an alternative is to use regular expressions (regex), through the module re:

import re

s = "   minha  \n string do python    "
sem_espacos_a_mais = re.sub(' {2,}', ' ', s).strip(' ')
print(repr(sem_espacos_a_mais)) # 'minha \n string do python'

In this case, regex has a space (note the space just after the ' and before the {). Then the quantifier {2,} indicates "two or more occurrences", ie, we are looking for 2 or more spaces followed, which are replaced by only one space.

But this does not eliminate the space of the beginning and end of the string, so I use strip to remove them.


With split, you can pass as the space parameter, so it does not eliminate line breaks. The problem is that then you will have several empty strings as well:

print(", ".join(map(repr, s.split(' '))))
# '', '', '', 'minha', '', '\n', 'string', 'do', 'python', '', '', '', ''

But then just use filter to delete empty strings:

sem_espacos_a_mais = " ".join(filter(lambda x: len(x) > 0, s.split(' ')))

Or simply:

sem_espacos_a_mais = " ".join(filter(lambda x: x, s.split(' ')))

The above option works because empty strings are considered false, and filter only takes the elements for which the lambda returns True (that is, in this case the strings will not be empty). You can still pass None in place of lambda, as indicated by the another answer, for in this case filter assumes the "identity function" (which is basically the lambda above, which returns the element itself).

The result is the same as the previous solution.


It’s also possible to do everything in a single regex, but I think it gets too complicated to be worth it:

sem_espacos_a_mais = re.sub('^ *([^ ])|(?<!^)( ) +|([^ ]) *$', r'\1\2\3', s)

She uses alternation (the character |, meaning "or"), with 3 different options:

  • ^ *([^ ]): the bookmark ^, which indicates the start of the string, followed by zero or more spaces ( *), followed by a character that is not space ([^ ]), or
  • (?<!^)( ) +: a space (( )) followed by one or more spaces ( +), provided that it is not at the beginning of the string ((?<!^) is a lookbehind negative that what exists before is not ^), or
  • ([^ ]) *$: a character that is not space, followed by zero or more spaces, and the end of the string ($)

Note that some passages are in parentheses, because this forms capture groups, that I can reference later. In this case, the replacement string (the second parameter passed to sub) indicates that I will use \1\2\3 - \1 is the first group (the first pair of parentheses), which in this case is the character that is not space, right after the spaces at the beginning of the string. \2 is the second group, which is the space that is not at the beginning of the string, and \3 is the third group, which is the character that is not space, before the spaces at the end of the string.

So I preserve these characters and eliminate the remaining spaces (if one of these groups is not captured, it is empty, so it does not interfere with the other substitutions). The result is the same as the previous code, but as I said, it is a little more complicated and maybe it is worth using the first two options (regex simpler + strip or split + filter).

  • 1

    Perfect! Thank you very much!

2

An alternative is to generate a list from breaking the sentence in spaces(\x20) using str split.(), of the generated list filter with the bulti-in function filter() the empty strings and merge the result with str.join()

s = "   minha  \nstring do python    "

f = lambda s:" ".join(filter(None, [x for x in s.split("\x20")]))

print(f(s))

Test in Repl.it: https://repl.it/repls/OpenEcstaticWatch

2

An alternative solution would be:

  1. Break the string into rows;

  2. Remove whitespace from each line individually;

  3. Merge lines into a single string again.

Look at that:

entrada = '  testando \n  minha   \n   string     do      python    '

# Quebra a string em uma lista de linhas
linhas = entrada.split('\n')

# Remove os espacos de cada uma das linhas da lista
linhas = [' '.join(i.split()) for i in linhas]

# Remonta a string a partir da lista de linhas
saida = '\n'.join(linhas)

print(saida)

Or even:

entrada = ' testando \n  minha   \n   string     do      python    '
saida = '\n'.join([' '.join(i.split()) for i in entrada.split('\n')])
print(saida)

Exit:

testando\nminha\nstring do python

Browser other questions tagged

You are not signed in. Login or sign up in order to post.