Efficient data structure for high color concurrent problem

Asked

Viewed 394 times

6

As part of a test I was asked to develop an application to manage high Scores in environments with high competition. This application should not persist on disk or use libraries and frameworks external. Basically the requirements say that:

  • Optimizing memory and CPU usage is critical
  • the service will receive a stellar number of requests simultaneous
  • for the http part we should use the class Httpserver.

The application is basically a set of Rest services for the user:

  • log in (you must return a token and keep the user authenticated for 10 minutes)
  • post a high score to a certain level
  • get a list of high Scores to a certain level (in order)

The system should only return 15 high Scores for each level, the high Scores are unique for each pair User / Level (ie for phase x the user y should only own one high score).

Considering the functional and non-functional requirements I thought of the following data structure to store high Scores (with minimum retention):

// Mapa que liga um nivel a uma lista ordenada de High Scores
private static final Map<Integer, ConcurrentSkipListSet<HighScore>> scores = 
   new ConcurrentHashMap<>();
// Conjunto de usuários autenticados
private static final Set<AuthToken> authorized = 
   Collections.newSetFromMap(new ConcurrentHashMap<AuthToken, Boolean>());

While this data structure seems efficient in reducing retention I don’t know how much it ends up wasting memory internally.

In the same way, the ConcurrentSkipListSet allow orderly insertion in log(n) and make the implementation of the method that returns high Scores trivial. On the other hand, to limit the Set to the top 15 high Scores by level (to avoid memory wastage and crashes), I did a real gymnastics including the use of Atomicinteger to count the amount of high Scores by level (the method size of ConcurrentSkipListSet is costly) and the need to synchronize multiple points with Locks.

In short, I’m not convinced that I’m using the most efficient strategy to solve the problem and I’m not used to thinking at such a low level...

In the "real" world would solve this problem in a trivial way with the use of a container Jetty, a Servlet for logging in and storing authentication in the session and probably a database Embedded like the H2. Unfortunately I’m not allowed to wear any of that.

In this way, how would you approach a problem of this kind given the constraints technology artificials? What data structures would you use (or develop your own?)? My strategy is "good enough" for the problem or I traveled in solution?


P.S. That problem submitted to Code Review looks a lot like what I’m solving (and the solution was similar to mine too).

  • I suppose it’s outside its scope a case where the same user submits a high score minor that another submitted previously, right? (even because it would not make sense...) Similarly, a submitted score cannot be removed, right? (I wonder if you can even safely rule out the 16th forward result)

  • @mgibsonbr, exactly. A single score per user / level (the largest) and there is no front to delete a high-score. We can safely rule out results beyond the fifteenth.

1 answer

2


I don’t have much experience with concurrent programming - particularly involving Java - so I can’t suggest you an efficient data structure, but I can give you some parameters to help you decide:

  1. Do not worry about the complexity order of the structure, but rather about its constant factor.

    You say that the "ConcurrentSkipListSet allows orderly insertion in log(n)", but is this really relevant in a list with only 15 elements? Imagine how many nodes this list will create? (remember that a Java object has a overhead fixed of ones 24 bytes if I remember correctly, one more reference to HighScore at least and other internal references for the other nodes) and failures in deviation prediction? (branch Prediction) Not that I find this relevant, since the bottleneck of modern processors is not the CPU, but the cache...

    A simple array (or the equivalent thread-safe available) with references to HighScores would take up less space (overhead of 24 bytes + 4 or 8 bytes for each of the 15 references). And the performance to travel it would be good, even if it was disorderly (just go through it all remembering the lowest index, then add or replace the lowest as the total is less than or equal to 15) - mainly because the references will occupy a contiguous position in the memory (which avoids miss cache). If the solution involves a critical section (synchronize), then it doesn’t matter the algorithm, because the lock alone will account for 90% of the run time. By the way...

  2. The synchronize is your worst enemy.

    Again, I’m not familiar with the features that Java offers for concurrent access, but if that’s the case ConcurrentSkipListSet can operate without using synchronize then it is better to use it than to do any "gambiarra" to be able to access the size in constant time. This method is "costly" only if its whole is large, but log(n) in a list of 15 elements is almost the same as O(1).

    Establish a critical section at all times involves access to the disk (some 10.000 CPU cycles last time I checked), so avoid using synchronize whenever possible. By the way...

  3. It is not necessary to wait for an operation to be ready before the request.

    When the user submits his high score, I suggest "line it up" for processing and move on. Take a look at ConcurrentLinkedQueue for example. If you have a thread dedicated to bringing together all the high Scores submitted (i.e. each request produces something, a thread apart consumes) it is easier to choose data structures that offer good performance without using a lock.

    If right after submitting a high score the user ask the list for a certain level, you can always get a copy of that list (it is recommended that the thread only update it in an atomic form), if necessary update it with the last Colors not yet processed (ConcurrentLinkQueye.toArray(T[] a)) and return. If you want to avoid this rework - and there is no problem in occasionally getting an outdated list (inevitable) that do not include the last score that the own user posted (avoidable, but brings more complexity) - just return the list in its current state.

Furthermore, I believe that your proposed solution is adequate. In practice I would suggest that instead of a set of tokens access, you map a token for a user (to prevent a user from updating the high score on the other), but if you were worried about cheating nothing you did would be 100% guaranteed... (since the data comes from the client side)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.