Removing unnecessary question mark from a String

Asked

Viewed 621 times

-3

I have possible strings, where I would like to have a way to prevent unnecessary question marks and add if missing. However this function applies only at the end of the string. Follow the list below and the expected result:

Possible wrong strings:

  • How do you go to the bathroom????
  • Can I invest today? How much is the minimum amount
  • Can’t I do that? 'Cause????

Expected/correct result:

  • How do you go to the bathroom?
  • Can I invest today? How much is the minimum amount?
  • Can’t I do that? 'Cause?

I started the code and I already check if there is a question mark at the end of the string and add it if it doesn’t exist. In this case I check the last 3 characters to prevent cases like: I am alive?!

if "?" not in title[-3:]:
        title += "?"
  • I already check if the last 3 characters have any question mark, if not I add at the end of the string. Now, the rest is not yet.

  • By the way, will they always be questions? For example: "Can’t I do it today??? Okay." is not a valid entry in this case?

  • Will always be questions.

2 answers

4


Can use thus:

import re

print(re.sub('\\?+', '?', "Olá????"))

The \?+ search for question marks in sequence and exchange for '?'

Note: The \\ in front of the ? serves to escape the ? in regex

You may want to adapt to multiple types of points, such as:

import re

def remove_pontuacao_em_sequencia(str):
    return re.sub('([?!.,])+', '\\1', str)

print( remove_pontuacao_em_sequencia('Olá???') )
print( remove_pontuacao_em_sequencia('Olá!!!') )
print( remove_pontuacao_em_sequencia('Olá...') )
print( remove_pontuacao_em_sequencia('Olá,,,') )
  • 1

    Thanks man, you helped here. I added your answer in my case. Hugs

  • 2

    And then, could Mr. downvoter explain the downvote? I want to know where I can improve the answer or if the answer was at an inconvenient time. Don’t you worry, sir. downvoter, I am not the type who holds grudges, I will not return you downvotes, I do not give nor downvote to those who have already attacked me and cursed on the site, much less for someone disagree with my answers, so can manifest the will.

0

To add also at the end of string which does not have the question mark, you can use the regular expression:

(\?+|$)

Which basically captures any sequence of one or more question marks or the end of the line and can therefore be replaced by the single question mark.

import re

tests = [
    ('Como faz para ir ao banheiro????', 'Como faz para ir ao banheiro?'),
    ('Posso investir hoje? Quanto é o valor mínimo', 'Posso investir hoje? Quanto é o valor mínimo?'),
    ('Não posso fazer isso? Por que????', 'Não posso fazer isso? Por que?')
]

for test in tests:
    result = re.sub(r'(\?+|$)', '?', test[0])
    assert result == test[1]

If, by any chance, your string end in a score that is not a question mark, it will be somewhat strange the result:

print(re.sub(r'(\?+|$)', '?', 'teste!'))  # teste!?

If it is interesting to eliminate this final score, just add along to the $ in regular expression:

(\\?+|[{string.punctuation}]+$)

Where string.punctuation refers to the module string. In this case, the entry teste!!!,..;! viraria teste?.

import re
import string

tests = [
    ('Como faz para ir ao banheiro????', 'Como faz para ir ao banheiro?'),
    ('Posso investir hoje? Quanto é o valor mínimo!..,,', 'Posso investir hoje? Quanto é o valor mínimo?'),
    ('Não posso fazer isso? Por que????', 'Não posso fazer isso? Por que?'),
    ('teste!!!,..;!', 'teste?')
]

for test in tests:
    result = re.sub(f'(\\?+|[{string.punctuation}]+$)', '?', test[0])
    assert result == test[1]

Browser other questions tagged

You are not signed in. Login or sign up in order to post.