Separate a string into blocks according to similarity

Asked

Viewed 112 times

0

I have the following challenge proposed:

input: "aaaaabbbbccccccaaaaa"

output : "5a4b6c7a"

And I made the following code to solve it:

def conta_numero_letras(string):
    armazena = ''
    for letra in string:
        if letra not in armazena:
            armazena += str(string.count(letra)) + letra
        else:
            pass
    return armazena

print(conta_numero_letras("aaaaabbbbccccccaaaaaaa"))

And my exit is being: 12a4b6c How do I scan by blocks the string, to get the output amount of each block individually?

2 answers

1


Diovana, I made the following code to solve the problem, in it has some comments explaining how the code works:

def conta_letras(string):
  #Pego o tamanho da string para verificar string vazia e também para o loop
  tamanho = len(string)

  #String vazia, retorna vazio...
  if tamanho == 0:
    return ""

  contagem = "" #String que contém a contagem de letras, string de retorno da função
  letra_atual = string[0] #Primeira letra da string
  quantidade = 0 #Contagem de letras

  for i in range(tamanho):
    #Se a letra for a mesma da anterior, aumenta a contagem
    if letra_atual == string[i]:
      quantidade += 1
    else:
      #Se a letra mudou, adiciona a contagem e a letra a string de retorno
      contagem += str(quantidade) + letra_atual
      #Reinicia a letra atual e a contagem
      letra_atual = string[i]
      quantidade = 1

    #Caso seja a última letra do loop, finalizo a contagem
    if i == (tamanho - 1):
      contagem += str(++quantidade) + letra_atual

  return contagem

print(conta_letras("aaaaabbbbccccccaaaaaaa"))
print(conta_letras("abcdefgh"))
print(conta_letras("zzzzaaaabbbbccccbbbbaaaacccczzzz"))
print(conta_letras("xptoaaabbb___zzxzz"))
  • 2

    Your answer is basically correct - but with a "no Pythonico" accent (getting to the point of having code that is simply not Python and will not fucnionar as you wanted - str(++quantidade) does not add "1" in "quantity" - it is not a syntax error because it twice applies the "positive number" operator + variable, but its value does not change). But principalmnte in Python is not used for by indices, except when really necessary - since the for natural language is a "for each" that already provides the data sequence element directly.

  • 1

    Opa, my intuition was to make the code really very explicit and with a syntax that can be applied in other languages, because it seems to be the case of someone who is starting to program, so the comments etc, thanks! :)

  • I’m starting, you got it! Your explanation was super didactic, the little problems in the syntax that can cause mistakes are easy to change. Thank you so much for your help! :)

1

The problem with your code is that it counts all occurrences of a given letter in the string, regardless of its position (the value returned by the method count) - and does not take into account separate blocks with the same letter.

For the record, this algorithm is called a "run length encoding" (RLE), and is a compression algorithm still used in many basic file types that we use on a day-to-day basis (e.g., GIF images, and Postscript images).

Good - the solution is to use more state variables - to go through the letter of the original string, to be able to look "backwards" and see which is the last letter - and take an action if it is equal to the current, and another case is different.

Another thing to keep in mind is that in Python, strings are immutable objects. So, although in this case, the performance is not relevant (anyway this code will run in less than 10ms for strings smaller than 30 pages of text), it is better that the mutable code data is built in a "mutable" structureas a list, and after everything is collected, "frozen" in an output string. For this "freeze" the key is the ". Join" method of the strings, which use a list as input.

Your job can stay that way:

def rle(input_str):
   count = -1
   previous = None
   result = []
   for char in input_str:
       count += 1
       if char != previous and previous != None:
            result.append((count, previous))
            count = 0
       previous = char
   if previous:
      result.append((count + 1, previous))

   output = ''.join((str(pair[0]) + pair[1]) for pair in result)
   return output

  • jsbueno, your code didn’t work when the input string is for example abcdefg, it counts only the first letter and the rest get zero.

  • A possible fix would be before the append, check that Count is actually set to zero, for example: #Previous code Else: if Count == 0: Count += 1 result.append((Count, Previous)) Count = 0

  • Thank you for the answer! I’m still starting to program, so it becomes a bit complicated to understand some concepts used in the algorithm.

  • @daniel - made corrections - was in a hurry - the important thing is to add 1 every time, the sum was a character "delayed".

Browser other questions tagged

You are not signed in. Login or sign up in order to post.