How to merge two multidimencional arrays, disregarding repeated values?

Asked

Viewed 466 times

3

I have two arrays multidimensional. An array I’m bringing from a file . json, the other is coming from one POST via Ajax.

My idea is to generate a new array with the merger of these other two arrays multidimensional, without repeating the data.

The arrays coming by POST via Ajax bring information from elements of a page. The 'time' field is unique, the 'vis' field may vary between 0 and 1 (not viewed and displayed).

This below is the array coming from the . json file:

{"AdministradorGabrielOliveira":
    [
        {
            "tempo":"2017-08-24T04:57:13-03:00",
            "vis":1
        },
        {
            "tempo":"2017-08-24T04:57:13-03:04",
            "vis":0
        },
        {
            "tempo":"2017-08-24T04:57:13-03:05",
            "vis":0
        }
    ]
}

And this below is the array coming from POST via Ajax:

{"AdministradorGabrielOliveira":
    [
        {
            "tempo":"2017-08-24T04:57:13-03:00",
            "vis":0
        },
        {
            "tempo":"2017-08-24T04:57:13-03:01",
            "vis":0
        }
    ]
}

In the end, I should have as a result of the merger of the two arrays, one another with the content:

{"AdministradorGabrielOliveira":
    [
        {
            "tempo":"2017-08-24T04:57:13-03:00",
            "vis":1
        },
        {
            "tempo":"2017-08-24T04:57:13-03:04",
            "vis":0
        },
        {
            "tempo":"2017-08-24T04:57:13-03:05",
            "vis":0
        },
        {
            "tempo":"2017-08-24T04:57:13-03:01",
            "vis":0
        }
    ]
}

I tried to go through the arrays within two foreach, but I was unsuccessful. Someone can give me a tip?

[EDITED] : I followed your tip and changed the structure for the user to hold the values 'time' and 'vis', with 'time' as id.

  • 1

    Is there a reason you don’t use the value of user how element key? A key per user and inside the user element, a key per user tempo that would have the value of vis. That would make your job a lot easier.

  • Following @Claydersonferreira’s suggestion would make it even easier to do the following things like take a user’s all time and do anything. In what you have right now you would have to iterate the entire collection.

  • What parameter to set if it is equal? The three values must be identical?

  • According to the structure and description of the AP I assume they are equal if user and tempo are the same since the vis indicates only whether or not it has been viewed

  • In fact I had thought about it only I did not put in practice because the code I developed was working in the initial model, so I didn’t want to change it so I didn’t lose focus. But had not thought that this could make things both easier... :/ rsrs good! If I understand what you are proposing, the array structure should look like this, shouldn’t it? {"user": [ { "time":"2017-08-24T04:57:13-03:00", "vis":0 }, { "time":"2017-08-24T04:57:13-03:01", "vis":0 } ] }

  • I even know how to merge these Json, but I don’t know how to do it without repeating the data.

  • Your problem is similar to this one: https://answall.com/q/231110/64969; I will try to go up as soon as possible an adaptation of the solution presented to the question Linkei

  • Oops! I’ll look right at the tips when arriving in ksa. But from now, thank you very much!

  • And yes. I’m going to use 'user' as the main array, and 'time' as id, so I can only search in the collection I really need. Thanks for the personal tip.

Show 4 more comments

2 answers

4

To solution from @Wictorchaves is sufficient for the question given the size of the data. The solution I propose below is Overkill, with an entry some orders of magnitude above what is being used for the case of the current issue.


Had a question I answered recently that needed performance to judge the intersection between two sets in C#. He used a quadratic algorithm o(n * m) to determine the intersection. In my answer, I did a theoretical preamble showing that it is possible to solve the problem in linear time (approximately o(n log n)).

Well, the question there was to calculate the intersection, but this one is to calculate the union. Here, a mathematical union of sets is being made itself, so the behavior of eliminate duplications.

In joint union, the following happens:

{a, b, c} união {b, c, d} gera {a, b, c, d}

To make the union operation this way, it is also necessary to remove the identical values.

If the sets are disordered, we need to compare all values of one set with those of the other set. Therefore, quadratic time o(n * m).

However, if the sets are ordered, the identification of identical terms requires linear time o(n + m). In the my answer, suggested an algorithm to identify the elements of the intersection. I will present a slightly modified version of it.

  • A, an ordered set of elements of the type E
  • B, an ordered set of elements of the type E
  • cmp : E,E -> S, a function which, given two elements of E, defines what the relationship between them; returns - if the first element is less than the second; + if the first element is larger than the second or 0 where the elements are identical (S is the whole signal: {-, 0, +})

    entrada:
      A, conjunto ordenado de elementos do tipo E
      B, conjunto ordenado de elementos do tipo E
      cmp, função sinal que compara dois elementos de E
    retorno:
      C, conjunto de elementos do tipo E oriundo da união de A e B
    
    começo
      i <- 0 # índice para iterar em A
      j <- 0 # índice para iterar em B
      C <- []
      ultimo_elemento_adicionado <- null
    
      enquanto i < A.tamanho && j < B.tamanho:
        s = cmp(A[i], B[j])
        se s == '0':
          # elementos são iguais, um deles como elemento candidato
          candidato <- A[i]
          i <- i + 1
          j <- j + 1
        senão, se s == '-':
          # A[i] < B[j], então próxima comparação será com A[i + 1] e B[j]; A[i] agora é candidato
          candidato <- A[i]
          i <- i + 1
        senão # caso trivial onde s == '+':
          # A[i] > B[j], então próxima comparação será com A[i] e B[j + 1]; B[j] agora é candidato
          candidato <- B[j]
          j <- j + 1
        # agora vamos ver se o candidato deve ser inserido em C: precisa ser distinto do último elemento adicionado, ou ser o primeiro elemento adicionado
        se ultimo_elemento_adicionado != null && cmp(candidato, ultimo_elemento_adicionado) != '0':
            ultimo_elemento_adicionado = candidato
            C.push(candidato)
      # caso i ou j extrapolem o tamanho de A ou B, respectivamente, não há mais comparações a se fazer
      retorna C
    fim
    

All in all, there is a guarantee that the search will take o(n + m) operations. Linear time, making your problem become tangible now.

On the other hand, in order to make this linear time algorithm, we need to pre-process the data sets A and B, as repaired by @Isac. As I explained in the C response#, it is possible to make a total ordering in any set of composite elements, provided that each of the keys of these elements is capable of total ordering.

Then, developing the function cmp that establishes total ordering relationship between the desired elements in your set of elements, you can make the elements in o(n log n) and then apply the above algorithm and get the union.

To turn JSON into PHP objects, you can use the function json_decode. In this case, after decoding the JSON, just sort using the function cmp total sorting of the field the array within the decoded object $json_decodificado->AdministradorGabrielOliveira.

  • Responding to a question about complexity, I realized that I made a small mistake in the union operation. Missing finish going through the remaining elements of the sets A and B when the while ends for the sets B and A respectively

  • 1

    Wow! Thank you so much for the @Jeffersonquesado clarification. I confess that I was developing without thinking so much about the weight of this processing. I will pay more attention to this "detail".

  • @Gabrieloliveira , for your case, with quantity controlled, do not need to worry too

3


This code removes all repeated values.

<?php
$entrada1 = '{"forum":[{"user":"AdministradorGabrielOliveira","tempo":"2017-08-24T04:57:13-03:00","vis":1},{"user":"AdministradorGabrielOliveira","tempo":"2017-08-24T04:57:13-03:22","vis":1},{"user":"AdministradorGabrielOliveira","tempo":"2017-08-24T04:57:13-03:04","vis":0},{"user":"AdministradorGabrielOliveira","tempo":"2017-08-24T04:57:13-03:05","vis":0}]}';
$entrada2 = '{"forum":[{"user":"AdministradorGabrielOliveira","tempo":"2017-08-24T04:57:13-03:00","vis":1},{"user":"AdministradorGabrielOliveira","tempo":"2017-08-24T04:57:13-03:04","vis":0},{"user":"AdministradorGabrielOliveira","tempo":"2017-08-24T04:57:13-03:05","vis":0},{"user":"AdministradorGabrielOliveira","tempo":"2017-08-24T04:57:13-03:01","vis":0}]}';

$json1 = json_decode($entrada1);
$json2 = json_decode($entrada2);

foreach($json1->forum as $indice => $f1){
    foreach($json2->forum as $f2){
        if(!isset($f1) || !isset($f2)){
            continue;
        }
        $f1_string = json_encode($f1);
        $f2_string = json_encode($f2);
        if ($f1->user == $f2->user && $f1->tempo == $f2->tempo){
            unset($json1->forum[$indice]);
        }
    }
}
  • 1

    I’ll test it right now, my dear!

  • Your code is great, brother! It just has that detail that so far I haven’t been able to solve. Next, the values of $entrada1 are received from a website as feedback from a clicked post, so if I have in $entrada1 the 'time'='X1' and 'vis'=1 , and in $entrada2 the 'time'='X1' and 'vis'=0 , I must merge the two whereas the position containing 'time'='X1', both in $in1 and $in2, this should result in $in1 = '{"forum":[{"user":"Administrativergabrieloliveira","time":"X1","vis":1}]}';

  • I made a change, now it compares the "user" and the "time", so if the "user" and the "time" is equal it removes the item from the "in1". So it’s easier for you to decide what you want to compare.

  • This solution is quadratic. If the size of the input can explode (for example, a few million data), then its execution time becomes unviable. If you have the assurance that it is size-controlled (maybe even a few thousand data?), that concern I mentioned earlier is purism

  • 1

    @Jeffersonquesado I only see a non-squadratic solution for the algorithm, which is to use specific keys in both arrays, which makes O(n). But it implies pre-processing of both input structures.

  • @Isac yes, with even pre-processing. I imagined here a solution using ordering, which would reduce to o(b log n + n). The linear part for preprocessing, plus o(n) of the search

  • Opa! Thank you @Wictorchaves and everyone who helped! This script is great for what I need. Actually Jefferson and Isac, there is the issue of over-processing of this data, but it won’t be a problem for what I have. The data that I will analyze, both those coming from the page and those coming from the file, are relatively small because the page even deletes the posting notifications with a certain amount of time, so I won’t even need to store old data for a long time. I will also, put a limit on the size of the array. But again, Thank you for the collaboration, guys. It helped the hell out!!!

  • @Gabrieloliveira mark this answer as an answer to your question, please.

  • 1

    Truth... Made!

Show 4 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.