If you want to replace two or more spaces by just one, an alternative is to use regular expressions (regex), through the module re
:
import re
s = " minha \n string do python "
sem_espacos_a_mais = re.sub(' {2,}', ' ', s).strip(' ')
print(repr(sem_espacos_a_mais)) # 'minha \n string do python'
In this case, regex has a space (note the space just after the '
and before the {
). Then the quantifier {2,}
indicates "two or more occurrences", ie, we are looking for 2 or more spaces followed, which are replaced by only one space.
But this does not eliminate the space of the beginning and end of the string, so I use strip
to remove them.
With split
, you can pass as the space parameter, so it does not eliminate line breaks. The problem is that then you will have several empty strings as well:
print(", ".join(map(repr, s.split(' '))))
# '', '', '', 'minha', '', '\n', 'string', 'do', 'python', '', '', '', ''
But then just use filter
to delete empty strings:
sem_espacos_a_mais = " ".join(filter(lambda x: len(x) > 0, s.split(' ')))
Or simply:
sem_espacos_a_mais = " ".join(filter(lambda x: x, s.split(' ')))
The above option works because empty strings are considered false, and filter
only takes the elements for which the lambda returns True
(that is, in this case the strings will not be empty). You can still pass None
in place of lambda, as indicated by the another answer, for in this case filter
assumes the "identity function" (which is basically the lambda above, which returns the element itself).
The result is the same as the previous solution.
It’s also possible to do everything in a single regex, but I think it gets too complicated to be worth it:
sem_espacos_a_mais = re.sub('^ *([^ ])|(?<!^)( ) +|([^ ]) *$', r'\1\2\3', s)
She uses alternation (the character |
, meaning "or"), with 3 different options:
^ *([^ ])
: the bookmark ^
, which indicates the start of the string, followed by zero or more spaces ( *
), followed by a character that is not space ([^ ]
), or
(?<!^)( ) +
: a space (( )
) followed by one or more spaces ( +
), provided that it is not at the beginning of the string ((?<!^)
is a lookbehind negative that what exists before is not ^
), or
([^ ]) *$
: a character that is not space, followed by zero or more spaces, and the end of the string ($
)
Note that some passages are in parentheses, because this forms capture groups, that I can reference later. In this case, the replacement string (the second parameter passed to sub
) indicates that I will use \1\2\3
- \1
is the first group (the first pair of parentheses), which in this case is the character that is not space, right after the spaces at the beginning of the string. \2
is the second group, which is the space that is not at the beginning of the string, and \3
is the third group, which is the character that is not space, before the spaces at the end of the string.
So I preserve these characters and eliminate the remaining spaces (if one of these groups is not captured, it is empty, so it does not interfere with the other substitutions). The result is the same as the previous code, but as I said, it is a little more complicated and maybe it is worth using the first two options (regex simpler + strip
or split
+ filter
).
Perfect! Thank you very much!
– Alexandre FG