Count how many elements are duplicated in a string

Question

Count how many elements are duplicated in a string

Asked 5 years, 4 months ago

Viewed 5,507 times

12

The purpose is to account for the number of elements that are repeated in a string and not just check for duplicates or count how many times these elements appear.

For example:

"aabbca1m" // 2 (a,b)
"aABCbc33zzzzzqw" // 5 (a,b,c,3,z)

I was able to assemble a code, but it counts how many times repeated characters appear on each loop.

function duplicateCount(text) {
    text = text.toLocaleLowerCase().split("").sort()
    let text2 = text

    let count = 0
    for (let index = 0; index < text.length; index++) {
        for (let chave = index + 1; chave < text2.length; chave++) {
            if (text[index] == text2[chave]) {
                count++
            }
        }
    }

    return count
}

console.log(duplicateCount("aabBcdeaa")) //deveria resultar em 2, porém resulta em 7

you want to take the most repeats and show the value?

– novic

2020/02/29 at 23:11
@Virgilionovic no, I want to take the amount of elements that are repeated. Equal I put in the two examples.

– zangs

2020/02/29 at 23:12
it’s very confusing the examples! can be clearer?

– novic

2020/02/29 at 23:14
'Cause here it would have to be 2: aabBcdeaa it’s weird that

– novic

2020/02/29 at 23:16
@Virgilionovic take a good look at this string: "aabBcdeaa", the only elements that are repeated here are the 'a' and the 'b', that is, 2 repeated elements. Perhaps it will be clearer if I edit the question and explain that it is independent to be in upper or lower case.

– zangs

2020/02/29 at 23:18
It is not clear because at the end of this text there are two more aa!?

– novic

2020/02/29 at 23:22
Buddy, it’s just any string, you know? It could be "home", for example. The result would be 1, in this case. Since the only character that repeats is 'a'. In the "aabBcdeaa" example, the only elements that repeat are 'b' and 'a', totaling 2 characters that are duplicated inside the string. I can’t be clearer than that.

– zangs

2020/02/29 at 23:27

Show 2 more comments

3 answers

11

You can create an object that stores all the letters of the string, followed by the number of occurrences of the character. Then, just return the number of properties of that object whose value, that is, the number of character appearances, is greater than 1.

Something like that:

function duplicateCount(string) {
  const charMap = {};
  
  // Mapeamos cada caractere da string ao nosso mapa.
  for (const char of string.toLowerCase()) {
    if (!charMap[char]) {
      charMap[char] = 1;
    } else {
      charMap[char] = charMap[char] + 1;
    }
  }
  
  // Obtemos todos os valores do nosso mapa e os filtramos,
  // de modo a manter somente aqueles números maiores que 1.
  const repeatedValues = Object.values(charMap)
    .filter((count) => count > 1);
    
  return repeatedValues.length;
}

console.log(duplicateCount('aabbca1m')); // 2
console.log(duplicateCount('aABCbc33zzzzzqw')); // 5

It is important to clarify that Object.values returns a array with the object values. For example:

const values = Object.values({
  a: 2,
  b: 1,
  c: 2,
  d: 3
});

console.log(values);

Other approaches:

You can reduce the if/else from the above excerpt, if you prefer:

function duplicateCount(string) {
  const charMap = {};

  for (const char of string.toLowerCase()) {
/*5*/   charMap[char] = (charMap[char] || 0) + 1;
  }

  return Object.values(charMap).filter((count) => count > 1).length;
}

console.log(duplicateCount('aabbca1m')); // 2
console.log(duplicateCount('aABCbc33zzzzzqw')); // 5

If you got confused by the expression (charMap[char] || 0) + 1 in line 5, it’s basically a game of expressions:

Case charMap[char] already defined, it will be a number. Then, we will add 1 at its present value.
Case charMap[char] does not exist, we will add 1 à 0. In that case, 0 was the value we defined as a type fallback through the logical operator OR (||).

You can also use the Array.prototype.reduce to do the same thing. I just think that the following code, in some cases, may not be the most ideal, especially if you are working with other people who might not have such a large Javascript domain. Anyway:

function duplicateCount(string) {
  const map = [...string.toLowerCase()].reduce(
    (map, char) => ({
      ...map,
      [char]: (map[char] || 0) + 1
    }),
    {}
  );

  return Object.values(map).filter((count) => count > 1).length;
}

console.log(duplicateCount('aabbca1m')); // 2
console.log(duplicateCount('aABCbc33zzzzzqw')); // 5

For a deeper understanding:

I found your solution valid, the only reason I won’t give it up is because I hope to find a more streamlined answer.

– zangs

2020/03/01 at 00:00
@zangs, what do you classify as "lean"? I can edit the answer to add other solutions, if you try to define me what that term means to you.

– Luiz Felipe

2020/03/01 at 00:06
Luiz, your answer was very satisfactory, just to be clear. I wanted to see if someone could get a more "short" and functional answer (if that’s possible) you know?

– zangs

2020/03/01 at 00:26
1

I edited the answer, adding two more "short" possibilities. Now dry? :D

– Luiz Felipe

2020/03/01 at 00:29
3

@zangs The first solution, in my opinion, is already very lean and with a very clear and readable code. As you can see, the solution with reduce, although minor, it is not necessarily "better" (going from the opinion of each one, I particularly think it gets a little more difficult to understand and not worth this complication if the only goal is to make the code shorter). Finally, smaller code is not necessarily "better"

– hkotsubo

2020/03/01 at 00:39
@hkotsubo totally agree with you, so much so that I made it clear that his response was satisfactory, it was more a question of exploring other answers.

– zangs

2020/03/01 at 00:49
@Luizfelipe excellent guy! Grateful for the time and feedback :)

– zangs

2020/03/01 at 00:50
It’s a joke to realize that you put in that answer without understanding what the guy wants and on top of everything else you come here to say that the answer is simply satisfactory, Remember that those who ask do not know the answer and how can evaluate if there is a shorter answer being the best? @Luizfelipe you made excellent solutions and did not deserve certain comments.

– novic

2020/03/01 at 13:00

Show 3 more comments

Browser other questions tagged javascript array string loop

You are not signed in. Login or sign up in order to post.

by Augusto Vasques • **15,321** points · Answer 1 · 2020-03-02T08:39:59+00:00

Use the regular expression /(.)(?=.*\1)/gi to find the repeating characters.

function duplicateCount(text) {
  let repetidos = [...text.match(/(.)(?=.*\1)/gi)].map(x => x.toLowerCase());
  return (new Set(repetidos)).size;
}

console.log(duplicateCount("aabBcdeaa"));                       // 2 => a,b
console.log(duplicateCount("aABCbc33zzzzzqw"));                 // 5 = > a,b,c,3,z
console.log(duplicateCount("faaaAbCbc33zzzzzqwbb3c3c3c3c3cf")); // 6 = > f,a,b,c,3,z

The logic is simple, the regular expression /(.)(?=.*\1)/gi traverse the string and only establish a capture group if the character has repetitions in front of it(look-Ahead).

In the expression (.) the parentheses () means that you must establish a capture group for . which corresponds to any character that is not line return.

Already the expression (?=.*\1) is the look-Ahead, means that the previous expression (.) will only be captured if the condition .*\1 be attended to. .* is zero or more characters that are not line return and \1 an occurrence of the catch group intended to establish.

Obs: In regular expression /(.)(?=.*\1)/gi the correspondence is always obtained, the group (.) and is established for further comparison and correspondence is accepted or not depending on the Matching observed in the look-Ahead (?=.*\1) .

At the end the regular expression has two flags gi where g means that capturing inside the string is global or is not for the first occurrence and i that makes capturing case insensitive by not distinguishing capital from minuscules.

After forming the array with all the repeated characters they are converted to lower case, this array is converted to an object Set which only allows unique values to be stored implying that the size that Set is the number of repeated characters.

by Sam • **79,597** points · Answer 2 · 2020-03-01T00:50:17+00:00

After adjusting the order of the characters with .split() + .sort() + .join() could make a while that will rotate from 0 up to the size of the string, picking up distinct characters. Inside the while you use a regular expression that will select, at each loop of the loop, the repeated characters and count with .length. If it’s bigger than 1 means that the character is repeated, and then the variable that controls the while will be increased by the size .length than the regular expression captured, otherwise it will be incremented normally with ++.

For example, the string aabBcdeaa will turn aaaabbcde. On the first loop, regex will capture the 4 "a". Since 4 is greater than 1, it will increment the count on the variable count with ++ and will also increment the variable i with 4 (string size captured by regex). Then the i (that initially had the value 0) will become worth 4 (0+4) and the next round of while will now take the first character "b", and will do the same thing. A regex will take "bb", which has 2 of length, and will add ++ in the variable count and add +2 in the variable i.

On the next turn you will start from the "c" character, and regex will return the size 1, i.e., the "c" is unique, thus the variable count will not be affected and the variable i will be incremented only with ++. That’s all I’m explaining you’ll see in the code below.

A regex text[i]+"+" takes the respective character iterated by the loop while in the string and group if there is more than 1 (sign of +). For example, on the first round of while, being the string aaaabbcde, will return an array with the 4 "a":

["aaaa"]

In doing text.match(re)[0].length will return the size of the string contained in the single item of the returned array (index [0]), that is 4.

Take the example:

function duplicateCount(text) {
   text = text.toLowerCase().split('').sort().join('');
   let count = i = 0;
   while(i < text.length){
      // escapar possíveis caracteres especias da regex
      let str = text[i].replace(/[\.*+?^${}()|[\]]/g, '\\$&');
      let len = text.match(str + "+")[0].length;
      len > 1 ? (count++, i += len) : i++;
   }
   return count;
}

console.log(duplicateCount("aabBcdeaa")); // 2 -> a, b
console.log(duplicateCount("casa")) // 1 -> a
console.log(duplicateCount("c_asabfgbf_")) // 4 -> _, a, b, f
console.log(duplicateCount("13451090836")) // 3 -> 0, 1, 3
console.log(duplicateCount("aABCbc33zzzzzqw((")) // 6 -> 3, a, b, c, z, (
console.log(duplicateCount("-,-\^\^((")) // 3 -> -, ^, (

Notice that on the last console.log the backslash is repeated 2 times (duplicateCount("-,-\^\^((")) but it is not accounted for by being an escape character.

Another option using .indexOf() and .lastIndexOf():

Like .indexOf() returns the position of the first character found and .lastIndexOf() the latter, if there is the same character more than once, the values will be different, but if the character is not repeated, the two methods will return the same value.

function duplicateCount(text) {
   text = text.toLowerCase().split('').sort().join('');
   let count = i = 0;
   while(i < text.length){
      let p1 = text.indexOf(text[i]);
      let p2 = text.lastIndexOf(text[i]);
      var len = text.substr(p1,p2-p1+1).length;
      len != 1 ? (count++, i += len) : i++;
   }
   return count;
}

console.log(duplicateCount("aabBcdeaa")); // 2 -> a, b
console.log(duplicateCount("casa")) // 1 -> a
console.log(duplicateCount("c_asabfgbf_")) // 4 -> _, a, b, f
console.log(duplicateCount("13451090836")) // 3 -> 0, 1, 3
console.log(duplicateCount("aABCbc33zzzzzqw((")) // 6 -> 3, a, b, c, z, (
console.log(duplicateCount("-,-\^\^((")) // 3 -> -, ^, (