Hypothesis tests with C# - Merge (sum) of values to find result

Question

Hypothesis tests with C# - Merge (sum) of values to find result

Asked 7 years, 11 months ago

Viewed 1,153 times

4

Hello, everybody.

I need to solve the following problem using C# (ASP . NET MVC):

I have a table in the database that contains ID and Value (decimal) as below:

The result I wish to find by realizing the combination of values is 1000.00.

I mean, I’d have the following:
Hypothesis 1: Sum the id values 1, 2, 3, 4 for 1000,00.
Hypothesis 2: Sum the id values 1, 4, 5 totaling 1000,00.
Hypothesis 3: Sum the id values 2, 3, 5 totaling 1000,00.

As a final result, I will display the 3 hypotheses for the user to choose the one they want.

I was able to solve this problem using the Solver, in Excel.

Can someone give a light to achieve this result or has faced this kind of problem?

well excel Solver uses some algorithms like: Simplex LP, Generalized Reduced Gradation (GRG) Nonlinear, and Evolutionary. There are several methods to attack this problem but n I know if it is possible to arrive at an optimal algorithm for this problem because it seems to me very be an NP-complete

– Um Programador

2017/09/21 at 15:05

2 answers

3

You can use recursion for this.

For this example, we will make a for of all the bank’s valuables. In each loop, we try to verify if there is a possibility, that is, we will look for all possible possibilities to verify if any fits to get the requested value.

For each loop we will check the numbers below the current, ie if the i = 2, we will compare only the values 100 and 200. At each loop the value of i will be increased, thus ensuring that in the latter all values are compared.

If we start the first loop with the value of 100 (id 1). Subtract this value from the total and verify the difference. If it is the value, we return it. If you still have a "rest", we will go after the next amounts to buy the amount.

For each "recursion level", the value sought will be the rest to complete the amount, ie, if you started with 1000 and subtracted 200, you need to fetch the 800 missing somewhere. If you don’t have a number that meets this premise, we will go to the "next level" and so on.

See below how it would look an adaptation of this response to your example:

var listaValores = new List<Dados>()
{
    new Dados { Id = 1, Valor = 100.00M},
    new Dados { Id = 2, Valor = 200.00M},
    new Dados { Id = 3, Valor = 300.00M},
    new Dados { Id = 4, Valor = 400.00M},
    new Dados { Id = 5, Valor = 500.00M}
};

public static IEnumerable<string> OterCombinacoes(List<Dados> dados, decimal sum, string values)
{
    for (int i = 0; i < dados.Count; i++)
    {
        decimal resto = sum - dados[i].Valor;
        string vals = dados[i].Id + "," + values;
        if (resto == 0)
        {
            yield return vals;
        }
        else
        {
            List<Dados> dadosPossiveis = dados.Take(i).Where(n => n.Valor <= sum).ToList();
            if (dadosPossiveis.Count > 0)
            {
                foreach (string s in OterCombinacoes(dadosPossiveis, resto, vals))
                {
                    yield return s;
                }
            }
        }
    }
}

The originalValue i added only to facilitate the persistence of the searched value (1000). For your case, it is not necessary.

See working on Dotnetfiddle.

References:

Reading tips:

1

I remembered a solution similar to this, for SQL: https://answall.com/a/193833/64969 ; same concept of recursive search in set =D ; in my case, there was a maximum limitation on the size of the subset, so this helped to decrease the levels of recursion

– Jefferson Quesado

2017/09/21 at 17:48

Browser other questions tagged c# asp.net-mvc asp.net-mvc-5

You are not signed in. Login or sign up in order to post.

by Miguel Angelo • **28,526** points · Answer 1 · 2017-09-22T00:10:01+00:00

If it is possible to sort the elements it is possible to optimize the search of combinations.

I will assume that it is possible, and that the elements are ordered in ascending order, as shown in the figure.

The elements larger than 500 are orange and the smallest blue. In the combination, there can only be 1 orange element (for the sum to be < 1000).

Optimized searches

There are several algorithms optimized for searching in an ordered set. An easy one is the binary search. However, if the distribution of values within the set is known, we can improve this by kicking a position where it is more likely to find the element by making a regression.

Regression: to find the value 250, we could kick to position 5 making a linear regression, as set out below:

(val-min)/(max-min)*tam = 5,26
val = 250  // valor buscado
min = 50   // menor valor
max = 1000 // maior valor
tam = 25   // contagem de elementos

And what we found is 200 there in that position... even because 250 is not an element of the list, but look how it was close.

If the distribution were different, it would be enough to make an appropriate regression to find the most likely place. Any optimization will depend on the data set, there is a solution to all problems.

Binary search: has order of log(n), where n is the amount of elements in the set. Note that for each element, finding a smaller one that it means searching only among the elements that are before it.

When binary search finds nothing, it has the advantage of already pointing to the nearest value.

Binary search on wikipedia

The . Net already provides native implementations, this can be an advantage:

Array.Binarysearch(T[], T)

var array = new[] { 1, 3, 5, 7, 11, };
var idx1 = Array.BinarySearch(array, 4); // retorna -3 ... ~(-3) = 2 é a posição onde deveria estar
var idx2 = Array.BinarySearch(array, 7); // retorna 3 ... achou!

List.Binarysearch(T)

Making the combinations

We can use recursion to find the combinations, but as the other answer already gives this solution, I will show a shape with stacks. Also, the stack size can be predefined: it is the amount of the smallest elements that we have to add up to almost pass 1000.

^{largest sum below 1000 = 7 minor, giving 930...
the stack will have 7 elements then}

We start by sweeping the list from the highest value to the lowest value. I’m going to write a step-by-step operation to give you a sense of what the final algorithm will look like.

Element 1000: We put the first value in the stack and check the sum. The first is 1000, and it can be returned right away... we remove it and add the next.

Element 980: You can already draw face that will not form any combination, because there is a value 20. The binary search would point to the index 0, and linear regression would return -0,7, I mean, it’s off the list.

Element 900: Let’s put 900 in the stack. We now have to fetch a value equal to 100. Using the regression I will get 1,3. At position 1, I think 60. Next time I find 100... I add 100 to the stack, and return the combination. I remove 100 from the stack.

Only there might be a combination with 3 elements. I’m going down 1 position, the one with the 60 and add it to the stack. 40 to go 100, but look, the lowest value is 50, so there will be no combinations with 60, nor with anything smaller than it. I take it from the pile.

Take the 900 from the pile.

Element 880: Let’s put the 880 in the pile. It’s 120 to 1000. A quick search already indicates the 120. After this combination, we have to try the 100, I put it in the pile. It’s 20... that’s it. I try the next one, 60, 880 is 60 to 1000. I think 50, I put in the pile too. Now it’s 10, but there’s no 10... I’m taking everything out of the pile.

I can already see what’s going on...

Program

So let’s go to the implementation.

This is a console program, which uses the data from the graphics above, and finds the combinations that add up to 1000, using stacks. It is possible to choose which search algorithm to use... the program will ask.

After running it shows the combinations, and shows the time it took. Here on my computer I didn’t notice much difference between using one search method or another, for a small random data set (up to 100 elements).

When I increased to 230 elements however the search with regression became faster:

regression search:

Elementos no array: 238
Tempo total: 25892,236ms
Combinações: 20206396

binary search

Elementos no array: 238
Tempo total: 31417,774ms
Combinações: 20205573

Code

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace CombArraySoma
{
    class Program
    {
        static void Main(string[] args)
        {
            while (true)
            {
                Console.ForegroundColor = ConsoleColor.Blue;
                Console.WriteLine("R = usar busca com regressão");
                Console.WriteLine("B = usar busca binária");
                Console.WriteLine("Outra tecla = sair do programa");
                var key = Console.ReadKey();
                Console.WriteLine();
                var tipoBusca = key.KeyChar == 'r' ? (Busca)BuscaComRegressao
                    : key.KeyChar == 'b' ? (Busca)BuscaBinaria :
                        null;

                if (tipoBusca == null) return;

                var array = new[]
                {
                    50, 60, 100, 120, 180, 200, 220, 320, 360, 420, 460, 480, 500, 520, 580, 640, 660, 700, 780, 800, 860,
                    880, 900, 980, 1000
                };
                //var rand = new Random();
                //array = Enumerable.Range(0, 1000).Select(x => (int)(rand.NextDouble() * 1000) + 1)
                //    .Where(x => x >= 50 && x <= 1000)
                //    .Select(x => x - (x % 10))
                //    .Distinct()
                //    .OrderBy(x => x)
                //    .ToArray();

                Console.ForegroundColor = ConsoleColor.Yellow;
                Console.WriteLine($"Elementos no array: {array.Length}");
                Console.WriteLine($"[{string.Join(",", array)}]");

                const int objetivo = 1000;

                int soma = 0;

                // a pilha pode ser pré-alocada
                int tampilha;
                for (tampilha = 0; tampilha < array.Length; tampilha++)
                {
                    soma += array[tampilha];
                    if (soma > objetivo)
                        break;
                }

                var pilha = new Stack<int>(tampilha);

                var startTime = Stopwatch.GetTimestamp();

                var combos = new List<int[]>(array.Length * array.Length);
                soma = 0;
                var idx = array.Length - 1;
                pilha.Push(idx);
                soma += array[idx];
                while (pilha.Count > 0)
                {
                    if (soma == objetivo)
                    {
                        combos.Add(pilha.Reverse().ToArray());
                        while (pilha.Count > 0)
                        {
                            idx = pilha.Pop();
                            soma -= array[idx];
                            idx--;
                            if (idx >= 0)
                            {
                                pilha.Push(idx);
                                soma += array[idx];
                                break;
                            }
                        }
                    }
                    else if (soma < objetivo)
                    {
                        var top = pilha.Peek() - 1;
                        idx = tipoBusca(array, objetivo - soma, 0, top);
                        if (idx < 0) idx = ~idx - 1;
                        if (idx >= 0 && idx <= top && array[idx] + soma <= objetivo)
                        {
                            pilha.Push(idx);
                            soma += array[idx];
                        }
                        else
                        {
                            while (pilha.Count > 0)
                            {
                                idx = pilha.Pop();
                                soma -= array[idx];
                                idx--;
                                if (idx >= 0)
                                {
                                    pilha.Push(idx);
                                    soma += array[idx];
                                    break;
                                }
                            }
                        }
                    }
                }

                var endTime = Stopwatch.GetTimestamp();

                foreach (var combo in combos.Take(20))
                {
                    for (int it = 0; it < combo.Length; it++)
                    {
                        Console.ForegroundColor = ConsoleColor.DarkGray;
                        if (it > 0)
                            Console.Write($" + ");
                        Console.ForegroundColor = ConsoleColor.DarkGray;
                        Console.Write($" [");
                        Console.ForegroundColor = ConsoleColor.Yellow;
                        Console.Write(combo[it]);
                        Console.ForegroundColor = ConsoleColor.DarkGray;
                        Console.Write($"]");
                        Console.ForegroundColor = ConsoleColor.White;
                        Console.Write(array[combo[it]]);
                    }
                    Console.ForegroundColor = ConsoleColor.DarkGray;
                    Console.Write($" = ");
                    Console.ForegroundColor = ConsoleColor.Green;
                    Console.Write($"{combo.Select(i => array[i]).Sum()}");
                    Console.ForegroundColor = ConsoleColor.DarkGray;
                    Console.WriteLine();
                }

                if (combos.Skip(20).Any())
                {
                    Console.ForegroundColor = ConsoleColor.Yellow;
                    Console.WriteLine($"...");
                }

                Console.ForegroundColor = ConsoleColor.Yellow;
                Console.WriteLine($"Tempo total: {(endTime - startTime) * 1000.0 / Stopwatch.Frequency:F3}ms");

                Console.ForegroundColor = ConsoleColor.Yellow;
                Console.WriteLine($"Combinações: {combos.Count}");
            }
        }

        private static int BuscaBinaria(int[] array, int valor, int a, int b)
        {
            if (b - a + 1 < 0) return ~a;
            return Array.BinarySearch(array, a, b - a + 1, valor);
        }

        public static int BuscaComRegressao(int[] array, int valor, int a, int b)
        {
            if (b < a) return ~a;

            var x = a == b ? a : (valor - array[a]) * (b - a) / (array[b] - array[a]);
            if (x > b) return ~(b + 1);
            if (x < a) return ~a;

            // a regressão se beneficia da localidade de cache do processador
            // com as varreduras sequenciais abaixo
            while (array[x] > valor)
            {
                if (x == a) return ~a;
                x--;
            }
            while (array[x] < valor)
            {
                x++;
                if (x > b) return ~(b + 1);
            }
            return array[x] == valor ? x : ~x;
        }

        delegate int Busca(int[] array, int valor, int a, int b);
    }
}

The output is as follows:

Other optimizations

Algorithms like this can benefit from some other optimizations like memoization and processor cache locale.

Memoisation: is to store in a dictionary the intermediate results that can be requested more than once. For example, if the algorithm needs to know several times which combinations generate 200, then it is interesting to put these combinations in a dictionary of 200 combinations. So during the process of finding the combinations that give 1000, when we want the value 200 we already know what they are.

The efficiency of this depends a lot on the data set you are working with, and in addition can end up consuming a lot of memory.

Cache location: the regression search uses this property, because after finding the most likely position, it has to sweep the elements around that position to find the exact position. It turns out that when the processor loads a memory position the surrounding elements are loaded together, which makes the algorithm very efficient even if it doesn’t hit 100%.

The same cannot be said of binary search for example, because at the beginning of the search accesses memory positions that can be very distant from each other. Despite this, binary search can be considered an algorithm "cache oblivious", i.e., it subdivides the problem so that eventually the problem dataset will fit in the cache, regardless of the cache hierarchy being used.

The best thing to do is to check your data, and use the most suitable algorithm... preferably by doing some performance tests.

P.S.

Code I made to generate the bar graph for this answer

/*/

var vals = [
  [1000, 980, 900, 880, 860, 800, 780, 700, 660, 640, 580, 520, 500, 480, 460, 420, 360, 320, 220, 200, 180, 120, 100, 60, 50],
  [1000, 900, 840, 820, 800, 760, 700, 660, 640, 600, 560, 540, 520, 500, 480, 460, 440, 400, 300, 200, 180, 140, 80, 60, 50],
  [1000, 860, 800, 780, 740, 720, 640, 600, 560, 540, 500, 480, 440, 420, 400, 360, 340, 320, 300, 280, 240, 180, 120, 60, 50]
  ]

vals.map(createCanvas)
/*/
var h = 1000
var ch = 100
for(var xx=0;xx<10000;xx++) {
  vals = randomArray(25, h)
  vals.sort((x,y)=>y-x)
  vals[0] = 1000
  vals[vals.length-1] = 50
  vals[5]=800
  vals[10]=500
  vals.sort((x,y)=>y-x)
  //console.log(xx)
  var it = 1
  for (; it < vals.length; it++)
    if (vals[it] == vals[it-1]) break
  if (it >= vals.length && vals[vals.length-1]>=50) break
}
createCanvas(vals)
//console.log(vals)
/**/

function createCanvas(vals) {
  // var myCanvas = document.getElementById("myCanvas")
  var myCanvas = document.createElement("canvas")
  document.body.appendChild(myCanvas)
  
  myCanvas.width = 550
  myCanvas.height = 170
  myCanvas.style.width = myCanvas.width+"px"
  myCanvas.style.height = myCanvas.height+"px"
  
  var h = 1000
  var ch = 100
  //myCanvas.width = 550
  //myCanvas.height = ch + 100

  var ctx = myCanvas.getContext("2d")
  //ctx.translate(10, 10);

  vals.sort((x,y)=>x-y)
  var w = Math.floor(myCanvas.width / vals.length)
  for (var it = 0; it < vals.length; it++) {
    var v = vals[it]
    var r = v > h/2 ? 1.0 : 0.0;
    var g = v > h/2 ? mapp(v,h/2,h,0.9,0.3,2) : mapp(v,0,h/2,0.1,0.8,2);
    var b = v > h/2 ? 0.0 : mapp(v,0,h/2,0.3,1,0.3);
    var barH = ch*vals[it]/h
    drawBar(ctx, it*w+1, ch - barH, w - 2, barH, rgb(r,g,b))
    drawText(ctx, it, v, it*w + 3, ch+10, -Math.PI*80/180)
  }

  drawLine(ctx, 0,ch,550,ch,"#000000");
}

function drawLine(ctx, startX, startY, endX, endY,color){
    ctx.save();
    ctx.strokeStyle = color;
    ctx.lineWidth = 1;
    ctx.beginPath();
    ctx.moveTo(startX, startY+0.5);
    ctx.lineTo(endX, endY+0.5);
    ctx.stroke();
    ctx.restore();
}

function drawBar(ctx, upperLeftCornerX, upperLeftCornerY, width, height,color){
    ctx.save();
    ctx.fillStyle=color;
    ctx.fillRect(upperLeftCornerX,upperLeftCornerY,width,height);
    ctx.restore();
}

function randomArray(length, max) {
    return Array.apply(null, Array(length)).map(function() {
        var r = Math.random()//Math.pow(Math.random()/2,Math.random())
        r += 0.01
        r = Math.floor(mapp(r,0,1,0,max,1))
        r = Math.floor(r/20)*20
        return r;
    });
}

function rgb(r,g,b) { return '#'+[r,g,b].map(function(d){return (d>=1?255:d<0?0:Math.floor(d*256)).toString(16).padStart(2, '0')}).join("")
}

function mapp(x,a,b,c,d,p) { return Math.pow((x-a)/(b-a),p)*(d-c)+c }

function drawText(ctx, it, v, x, y, angle) {
  var text = "["+it+"] "+v.toString()
  ctx.save();
  ctx.translate(x, y);
  ctx.rotate(-angle);
  ctx.textAlign = "left";
  ctx.font = ctx.font.replace(/\d+px/, "13px");
  var idx = "["+it+"]: "
  var sz = ctx.measureText(idx);
  ctx.fillStyle = 'gray';
  ctx.fillText(idx, 0, 0);
  ctx.fillStyle = v > 500 ? 'red' : 'blue';
  ctx.fillText(v.toString(), sz.width, 0);
  ctx.restore();
}

Inspiration for the chart I made, and in addition they also have something about a similar problem - packaging problem (wikipedia)