Identify new, removed and modified objects from two ordered "collections" using extra O(1) memory in O(N) time

Asked

Viewed 48 times

4

I have 2 "sets" of elements, neo and old, which has elements of the type T, all non-zero. Also, I know that these elements are identified by a key K and I can extract using this key using a Function<T, K> getKey. I guarantee the uniqueness of the key inside neo and within old (that is, there are no two elements of neo with the same key, nor 2 distinct elements of old with the same key).

Also, I guarantee that the elements in neo and old are ordered strictly downward in that key considering a Comparator<K> keyComparator. Informally, as if neo[i].key < neo[j].key for all i < j, and similarly old[i].key < old[j].key.

I define that an element oldEl was removed if old.contains(oldEl) and if there is no neoEl belonging to neo such that getkey.apply(oldEl).equals(getkey.apply(neoEl)).

I define a modification neoEl if neo.contains(neoEl) and if there is any oldEl belonging to old such that getkey.apply(oldEl).equals(getkey.apply(neoEl)), however !oldEl.equals(neoEl) (if by chance oldEl.equals(neoEl), I must consider that the element appears repeatedly).

I define an element creation neoEl if neo.contains(neoEl) and if there is no oldEl belonging to old such that getkey.apply(oldEl).equals(getkey.apply(neoEl)).

So far, I’ve been able to do the following:

final Consumer<T> NOOP = __ -> {};

public void doDelta(Iterator<T> old, Iterator<T> neo,
        Function<T, K> getkey, Comparator<K> keyComparator,
        Consumer<T> insertInterceptor,
        Consumer<T> updateInterceptor,
        Consumer<T> deleteInterceptor,
        Consumer<T> ignoreInterceptor) {

    final int TAKE_OLD = 1;
    final int TAKE_NEO = 2;
    final int TAKE_BOTH = TAKE_OLD | TAKE_NEO;
    int nextAction = TAKE_BOTH;
    T oldT = null, neoT = null;
    K oldK = null, neoK = null;

    if (!old.hasNext()) {
        // não há old, portanto só tem inserção
        if (insertInterceptor != NOOP) {
            neo.forEachRemaining(insertInterceptor);
        }
        return;
    }
    if (!neo.hasNext()) {
        // não há neo, portanto só tem remoção
        if (deleteInterceptor != NOOP) {
            old.forEachRemaining(deleteInterceptor);
        }
        return;
    }

    while (true) {
        if (nextAction == TAKE_BOTH) {
            if (!old.hasNext()) {
                if (!neo.hasNext()) {
                    return;
                }
                neoT = neo.next();
                nextAction = TAKE_NEO;
                break;
            }
            oldT = old.next();
            oldK = getkey.apply(oldT);
            if (!neo.hasNext()) {
                nextAction = TAKE_OLD;
                break;
            }
            neoT = neo.next();
            neoK = getkey.apply(neoT);
        } else if (nextAction == TAKE_OLD) {
            if (!old.hasNext()) {
                nextAction = TAKE_NEO;
                break;
            }
            oldT = old.next();
            oldK = getkey.apply(oldT);
        } else { //if (nextAction == TAKE_NEO) {
            if (!neo.hasNext()) {
                nextAction = TAKE_OLD;
                break;
            }
            neoT = neo.next();
            neoK = getkey.apply(neoT);
        }
        int cmp = keyComparator.compare(oldK, neoK);
        if (cmp == 0) {
            nextAction = TAKE_BOTH;
            if (oldT.equals(neoT)) {
                ignoreInterceptor.accept(neoT);
            } else {
                updateInterceptor.accept(neoT);
            }
        } else if (cmp < 0) {
            deleteInterceptor.accept(oldT);
            nextAction = TAKE_OLD;
        } else {
            insertInterceptor.accept(neoT);
            nextAction = TAKE_NEO;
        }
    }

    // entrou aqui porque acabou o old
    if (nextAction == TAKE_NEO) {
        if (insertInterceptor != NOOP) {
            insertInterceptor.accept(neoT);
            neo.forEachRemaining(insertInterceptor);
        }
        return;
    }
    if (nextAction == TAKE_OLD) {
        if (deleteInterceptor != NOOP) {
            deleteInterceptor.accept(oldT);
            old.forEachRemaining(deleteInterceptor);
        }
    }
}

The idea is basically, every cycle, to know if I need to pull element of the set neo, of old or both; at first you need to take elements from both sets. Then, what can happen while there are elements on both sides:

  • if the key is the same, take both and test to see if it was updated or the element remained identical
  • if the smaller key is old, then this element was removed from the new set of elements and need to take only the old
  • if the smaller key is neo, then this element was created in the new element set and need to take only the neo

neo and old are defined as Iterator<T> to not have to load all the information of these sets into memory a priori, but another alternative Lazy loading is valid also.

The Consumer<T> do the intercept job if discovered if the element is added/removed/updated from the list (the ignoreInterceptor is used more as a curiosity if an element repeats itself equally in both sets).

The NOOP is the trivial developer, silently discards if you receive something.

Note that there is no use of extra memory in this algorithm, considering that the Consumers, getkey, keyComparator and the equals of the classes involved are oracle functions.

Also taking into account that Consumers, getkey, keyComparator and the equals of the classes involved are oracle functions, I will do at most O(|old| + |neo|) operations as a whole, so I follow that the time complexity is linear in the size of sets.

What bothers me here are the cases when one of the collections runs out, or when I receive an empty collection. It seems that it is too convoluted and messed up the code. How to make the code cleaner?

Would have been able to use Stream to the end I’m using the Iterator? That could make the code cleaner?

And Spliterator, would have some advantage in it besides guaranteeing me the characteristics (comparator provided by own Spliterator, orderly, distinguished, not void)?

  • Looking over (and without testing many cases), I think you don’t need the first 2 if's, can go straight to the while(true): if old is empty, it takes the first element of neo, arrow nextAction = TAKE_NEO and interrupts the loop. Then it falls on the penultimate if and executes insertInterceptor in the elements of neo. What if neo is empty, something similar happens and he ends up falling on the last if, running the deleteInterceptor in all elements of old (that is, the same as the first if'would). I don’t know if the code is clearer with or without these if's, but finally...

  • As for the rest, I don’t know if I can improve it. I haven’t been using streams that much, so maybe I’ve even got some fancy way, but I’d guess it’s not worth it :-) (but sure, we’ll wait if someone answers and shows me I’m wrong). For me it is already clear enough, even if I have to check several times if there are still elements remaining...

  • @hkotsubo is right, these ifs are artifacts from an earlier version, to avoid additional complexity in the consumption of terminal things, but that has become unnecessary

  • 1

    I don’t know if it’s enough for an answer, but anyway, I played the logic of taking the next element in a method, and I think it got a little cleaner, because I eliminated the need for the nextAction and there’s a lot of hasNext dispersed: https://ideone.com/NBHCrn - nor have I tried with streams because I don’t think you need it - and also because of laziness and lack of in-depth knowledge of the API :-) And also I haven’t done extensive tests, so there will be some case that I didn’t cover...

  • I thought it was much better to read now, with these modifications, @hkotsubo

No answers

Browser other questions tagged

You are not signed in. Login or sign up in order to post.