Count how many elements are duplicated in a string

Asked

Viewed 5,507 times

12

The purpose is to account for the number of elements that are repeated in a string and not just check for duplicates or count how many times these elements appear.

For example:

"aabbca1m" // 2 (a,b)
"aABCbc33zzzzzqw" // 5 (a,b,c,3,z)

I was able to assemble a code, but it counts how many times repeated characters appear on each loop.

function duplicateCount(text) {
    text = text.toLocaleLowerCase().split("").sort()
    let text2 = text

    let count = 0
    for (let index = 0; index < text.length; index++) {
        for (let chave = index + 1; chave < text2.length; chave++) {
            if (text[index] == text2[chave]) {
                count++
            }
        }
    }

    return count
}

console.log(duplicateCount("aabBcdeaa")) //deveria resultar em 2, porém resulta em 7

  • you want to take the most repeats and show the value?

  • @Virgilionovic no, I want to take the amount of elements that are repeated. Equal I put in the two examples.

  • it’s very confusing the examples! can be clearer?

  • 'Cause here it would have to be 2: aabBcdeaa it’s weird that

  • @Virgilionovic take a good look at this string: "aabBcdeaa", the only elements that are repeated here are the 'a' and the 'b', that is, 2 repeated elements. Perhaps it will be clearer if I edit the question and explain that it is independent to be in upper or lower case.

  • It is not clear because at the end of this text there are two more aa!?

  • Buddy, it’s just any string, you know? It could be "home", for example. The result would be 1, in this case. Since the only character that repeats is 'a'. In the "aabBcdeaa" example, the only elements that repeat are 'b' and 'a', totaling 2 characters that are duplicated inside the string. I can’t be clearer than that.

Show 2 more comments

3 answers

11


You can create an object that stores all the letters of the string, followed by the number of occurrences of the character. Then, just return the number of properties of that object whose value, that is, the number of character appearances, is greater than 1.

Something like that:

function duplicateCount(string) {
  const charMap = {};
  
  // Mapeamos cada caractere da string ao nosso mapa.
  for (const char of string.toLowerCase()) {
    if (!charMap[char]) {
      charMap[char] = 1;
    } else {
      charMap[char] = charMap[char] + 1;
    }
  }
  
  // Obtemos todos os valores do nosso mapa e os filtramos,
  // de modo a manter somente aqueles números maiores que 1.
  const repeatedValues = Object.values(charMap)
    .filter((count) => count > 1);
    
  return repeatedValues.length;
}

console.log(duplicateCount('aabbca1m')); // 2
console.log(duplicateCount('aABCbc33zzzzzqw')); // 5

It is important to clarify that Object.values returns a array with the object values. For example:

const values = Object.values({
  a: 2,
  b: 1,
  c: 2,
  d: 3
});

console.log(values);

Other approaches:

You can reduce the if/else from the above excerpt, if you prefer:

function duplicateCount(string) {
  const charMap = {};

  for (const char of string.toLowerCase()) {
/*5*/   charMap[char] = (charMap[char] || 0) + 1;
  }

  return Object.values(charMap).filter((count) => count > 1).length;
}

console.log(duplicateCount('aabbca1m')); // 2
console.log(duplicateCount('aABCbc33zzzzzqw')); // 5

If you got confused by the expression (charMap[char] || 0) + 1 in line 5, it’s basically a game of expressions:

  • Case charMap[char] already defined, it will be a number. Then, we will add 1 at its present value.
  • Case charMap[char] does not exist, we will add 1 à 0. In that case, 0 was the value we defined as a type fallback through the logical operator OR (||).

You can also use the Array.prototype.reduce to do the same thing. I just think that the following code, in some cases, may not be the most ideal, especially if you are working with other people who might not have such a large Javascript domain. Anyway:

function duplicateCount(string) {
  const map = [...string.toLowerCase()].reduce(
    (map, char) => ({
      ...map,
      [char]: (map[char] || 0) + 1
    }),
    {}
  );

  return Object.values(map).filter((count) => count > 1).length;
}

console.log(duplicateCount('aabbca1m')); // 2
console.log(duplicateCount('aABCbc33zzzzzqw')); // 5

For a deeper understanding:

  • I found your solution valid, the only reason I won’t give it up is because I hope to find a more streamlined answer.

  • @zangs, what do you classify as "lean"? I can edit the answer to add other solutions, if you try to define me what that term means to you.

  • Luiz, your answer was very satisfactory, just to be clear. I wanted to see if someone could get a more "short" and functional answer (if that’s possible) you know?

  • 1

    I edited the answer, adding two more "short" possibilities. Now dry? :D

  • 3

    @zangs The first solution, in my opinion, is already very lean and with a very clear and readable code. As you can see, the solution with reduce, although minor, it is not necessarily "better" (going from the opinion of each one, I particularly think it gets a little more difficult to understand and not worth this complication if the only goal is to make the code shorter). Finally, smaller code is not necessarily "better"

  • @hkotsubo totally agree with you, so much so that I made it clear that his response was satisfactory, it was more a question of exploring other answers.

  • @Luizfelipe excellent guy! Grateful for the time and feedback :)

  • It’s a joke to realize that you put in that answer without understanding what the guy wants and on top of everything else you come here to say that the answer is simply satisfactory, Remember that those who ask do not know the answer and how can evaluate if there is a shorter answer being the best? @Luizfelipe you made excellent solutions and did not deserve certain comments.

Show 3 more comments

4

Use the regular expression /(.)(?=.*\1)/gi to find the repeating characters.

function duplicateCount(text) {
  let repetidos = [...text.match(/(.)(?=.*\1)/gi)].map(x => x.toLowerCase());
  return (new Set(repetidos)).size;
}

console.log(duplicateCount("aabBcdeaa"));                       // 2 => a,b
console.log(duplicateCount("aABCbc33zzzzzqw"));                 // 5 = > a,b,c,3,z
console.log(duplicateCount("faaaAbCbc33zzzzzqwbb3c3c3c3c3cf")); // 6 = > f,a,b,c,3,z

The logic is simple, the regular expression /(.)(?=.*\1)/gi traverse the string and only establish a capture group if the character has repetitions in front of it(look-Ahead).

In the expression (.) the parentheses () means that you must establish a capture group for . which corresponds to any character that is not line return.

Already the expression (?=.*\1) is the look-Ahead, means that the previous expression (.) will only be captured if the condition .*\1 be attended to. .* is zero or more characters that are not line return and \1 an occurrence of the catch group intended to establish.

Obs: In regular expression /(.)(?=.*\1)/gi the correspondence is always obtained, the group (.) and is established for further comparison and correspondence is accepted or not depending on the Matching observed in the look-Ahead (?=.*\1) .

At the end the regular expression has two flags gi where g means that capturing inside the string is global or is not for the first occurrence and i that makes capturing case insensitive by not distinguishing capital from minuscules.

After forming the array with all the repeated characters they are converted to lower case, this array is converted to an object Set which only allows unique values to be stored implying that the size that Set is the number of repeated characters.

  • 1

    It is worth remembering that \w only consider letters, numbers and the character _. Although the question does not make clear what should be considered (she only gave examples with letters, but at the same time speaks of "repeated elements", which may imply which is any character). Although it wouldn’t be hard to adapt the regex to accept anything... And I don’t think it even needs the + after the \1, because if there is a single occurrence of \1, already indicates that it repeats itself.

  • @hkotsubo, I built the expression in what I understood in what understood question but could be done with anything other than space and newline or anything other than newline. You really don’t need the + after \1, i started from a simpler expression whose intention was to consume a whole chain of repetitions, but I will correct because really just find a repetition to satisfy the condition.

  • Oh yes, it is that the question - as I understand it - does not make clear which characters should actually be considered, which gives room for different interpretations. That doesn’t mean your answer is wrong...

  • 1

    @hkotsubo Made switched capture group, you are right, really it is more complete use . instead of \w

  • 1

    Blz, I had already voted before and I keep my vote :-) Just one detail (because I’m pedantic): "the previous expression (.) will only be captured if the condition .*\1 be attended to." - actually the group (.) will always be captured if you find a character (it does not depend on the Lookahead to be captured). After the group is captured, the condition .*\1 is checked, and if it is not met, the regex fails. And it makes sense, there is no way Lookahead know what is \1 if he has not been captured before.

  • This correspondence is always obtained, the group is established for further comparison and the capture is accepted or not depending on the correspondence observed in the look-Ahead.

Show 1 more comment

3

After adjusting the order of the characters with .split() + .sort() + .join() could make a while that will rotate from 0 up to the size of the string, picking up distinct characters. Inside the while you use a regular expression that will select, at each loop of the loop, the repeated characters and count with .length. If it’s bigger than 1 means that the character is repeated, and then the variable that controls the while will be increased by the size .length than the regular expression captured, otherwise it will be incremented normally with ++.

For example, the string aabBcdeaa will turn aaaabbcde. On the first loop, regex will capture the 4 "a". Since 4 is greater than 1, it will increment the count on the variable count with ++ and will also increment the variable i with 4 (string size captured by regex). Then the i (that initially had the value 0) will become worth 4 (0+4) and the next round of while will now take the first character "b", and will do the same thing. A regex will take "bb", which has 2 of length, and will add ++ in the variable count and add +2 in the variable i.

On the next turn you will start from the "c" character, and regex will return the size 1, i.e., the "c" is unique, thus the variable count will not be affected and the variable i will be incremented only with ++. That’s all I’m explaining you’ll see in the code below.

A regex text[i]+"+" takes the respective character iterated by the loop while in the string and group if there is more than 1 (sign of +). For example, on the first round of while, being the string aaaabbcde, will return an array with the 4 "a":

["aaaa"]

In doing text.match(re)[0].length will return the size of the string contained in the single item of the returned array (index [0]), that is 4.

Take the example:

function duplicateCount(text) {
   text = text.toLowerCase().split('').sort().join('');
   let count = i = 0;
   while(i < text.length){
      // escapar possíveis caracteres especias da regex
      let str = text[i].replace(/[\.*+?^${}()|[\]]/g, '\\$&');
      let len = text.match(str + "+")[0].length;
      len > 1 ? (count++, i += len) : i++;
   }
   return count;
}

console.log(duplicateCount("aabBcdeaa")); // 2 -> a, b
console.log(duplicateCount("casa")) // 1 -> a
console.log(duplicateCount("c_asabfgbf_")) // 4 -> _, a, b, f
console.log(duplicateCount("13451090836")) // 3 -> 0, 1, 3
console.log(duplicateCount("aABCbc33zzzzzqw((")) // 6 -> 3, a, b, c, z, (
console.log(duplicateCount("-,-\^\^((")) // 3 -> -, ^, (

Notice that on the last console.log the backslash is repeated 2 times (duplicateCount("-,-\^\^((")) but it is not accounted for by being an escape character.


Another option using .indexOf() and .lastIndexOf():

Like .indexOf() returns the position of the first character found and .lastIndexOf() the latter, if there is the same character more than once, the values will be different, but if the character is not repeated, the two methods will return the same value.

function duplicateCount(text) {
   text = text.toLowerCase().split('').sort().join('');
   let count = i = 0;
   while(i < text.length){
      let p1 = text.indexOf(text[i]);
      let p2 = text.lastIndexOf(text[i]);
      var len = text.substr(p1,p2-p1+1).length;
      len != 1 ? (count++, i += len) : i++;
   }
   return count;
}

console.log(duplicateCount("aabBcdeaa")); // 2 -> a, b
console.log(duplicateCount("casa")) // 1 -> a
console.log(duplicateCount("c_asabfgbf_")) // 4 -> _, a, b, f
console.log(duplicateCount("13451090836")) // 3 -> 0, 1, 3
console.log(duplicateCount("aABCbc33zzzzzqw((")) // 6 -> 3, a, b, c, z, (
console.log(duplicateCount("-,-\^\^((")) // 3 -> -, ^, (

  • 1

    Just remembering that if the string has some metacharacter of regex, it should be escaped with \, otherwise it will generate an invalid regex and give error. Ex: duplicateCount('aab((('), the parentheses will generate the string '(+', and this is an invalid regex , because the parentheses should be written as \( (see). Although the question does not make it very clear if it is only letters (it speaks in "repeated elements", which may imply that it is anything, although it only has letters in the examples). Anyway, my pedantry speaking louder again...

  • 1

    Well observed. I made a character escape and added another option without regex.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.