Sort list by string resemblance

Asked

Viewed 272 times

6

I have a list of string:

AAA
BBB
CCC
ABB
ABC
ACC
ACD

The user will type what he is looking for, would like to take to the first positions, the most similar. Example:

String: A

Upshot:

AAA
ABB
ABC
ACC
ACD
BBB
CCC

String: AB

Upshot:

ABB
ABC
AAA
ACC
ACD
BBB
CCC

String: C

Upshot:

CCC
AAA
ABB
ABC
ACC
ACD
BBB

String: AC

Upshot:

ACC
ACD
AAA
ABB
ABC
CCC
BBB

String: B

Upshot:

BBB
AAA
ABB
ABC
ACC
ACD
CCC

Edit:

Just incrementing the @Mario solution that worked perfectly:

lista.OrderByDescending(x => (x.StartsWith(padrao))).ThenByDescending(x => (x.Contains(padrao)));

And that way I got an even better result than expected.

  • Sort or filter?

  • 1

    Sort, no value will be removed from the list

  • 1

    Define similarity...

  • would be the equality at the beginning of the string, the closest to hitting the StartsWith

  • With the letter B what would be the order?

  • 1

    changed the question with other examples. Thank you

  • 1

    I tried to think of an alternative using quicksort, I failed miserably... Strings do not compare with each other in an obvious way, but with the external element in the list

Show 2 more comments

1 answer

10


What seems to have been defined as similar is whether the substring exists in the string of each element in the list. Then just sort the ones you have first, so the OrderByDecending() applied in the Contains(). It will group everything it contains and then what does not contain the text pattern.

using System;
using System.Collections.Generic;
using System.Linq;

public class Program {
    public static void Main() {
        var lista = new List<string> { "AAA", "BBB", "CCC", "ABB", "ABC", "ACC", "ACD" };
        Semelhante(lista, "A");
        Semelhante(lista, "B");
        Semelhante(lista, "C");
        Semelhante(lista, "AB");
        Semelhante(lista, "AC");
    }
    public static void Semelhante(List<string> lista, string padrao) {
        foreach (var item in lista.OrderByDescending(x => (x.Contains(padrao)))) {
            Console.WriteLine(item);
        }
        Console.WriteLine();
    }
}

Behold working in the ideone. And in the .NET Fiddle. Also put on the Github for future reference.

In new edition in the question has an even better option, but only the AP knew that was what needed.


Previously I interpreted differently what was resemblance. Stay just to try to help someone else.

I think it can be improved and not fully tested. Alert Gambi to use only LINQ :P

The OrderBy() waits for the element that should be used to classify it, which should be the key. So I am sending him the amount of occurrences of the substring that he found as key, after all the more occurrences, the closer it is. I used the Count() in string to find the number of occurrences.

It may be that the requirement of "similar" was not quite that, but the question does not make it so clear. The result is as expected.

I don’t know if AAB is better than ABC because it has 2 Bs or if because B comes before C (mine got like this).

using System;
using System.Collections.Generic;
using System.Linq;

public class Program {
    public static void Main() {
        var lista = new List<string> { "AAA", "BBB", "CCC", "ABB", "ABC", "ACC", "ACD" };
        var padrao = "A";
        foreach (var item in lista.OrderByDescending(x => x.Select((c, i) => x.Substring(i)).Count(sub => sub.StartsWith(padrao)))) {
            Console.WriteLine(item);
        }
        Console.WriteLine();
        padrao = "AB";
        foreach (var item in lista.OrderByDescending(x => x.Select((c, i) => x.Substring(i)).Count(sub => sub.StartsWith(padrao)))) {
            Console.WriteLine(item);
        }
    }
}

Behold working in the ideone. And in the .NET Fiddle. Also put on the Github for future reference.

  • thank you very much, I’m doing some tests and already give feedback

  • AAB would be better than ABC if the padrao for A alphabetically. The number of times the character appears in the string does not matter.

  • So how does it look ABC, ACA?

  • I changed the question with other examples, Thank you

  • And a ACA, where it would fit, looking for A. It’s looking too simple.

  • ACA would have no correspondence... would be in alphabetical order

  • I have changed to this, I will still check: https://dotnetfiddle.net/B65MB3. Otherwise it is difficult to define the requirement. You may need to add an alphabetical classification (composite key) for those that are similar or those that are not.

  • exactly that, thank you very much!

  • 1

    I just wouldn’t call it resemblance ;)

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.