This need has a lot of "Actors" like Erlang, clojure, scala or Akka (now scala standard but with java implementation as well).
There are other competing models too, you can go to apache Storm, have an application using and is very fast, in this case you can control the number of Spouts ( data generators ) for the bolts or have a single Spout and the amount of bolts for each "customer" as you like. In my case I have 10 bolts that keep recording in the database in the last layer so Storm. The possibilities of Storm are infinite but it is recommended for clusters, recommended, use in a single machine and have a problem with the data of his zookeeper that clog my tmpfs.
If it does not matter how many threads you want to use and its application is in 1 server, then you can think of the Lmax disruptor, practically the disruptor tries to keep the data hot or in the processor’s cache memory. By chance it will be the next evolution of my application that I mentioned.
Study hard, really hard all the possibilities I mentioned, just because they contain less accidental complexity than trying to manipulate threads, there’s no problem with that, but the less we need to go to the lowest level of everything, with a great value for money ratio, it is better.
Take it easy on the possibilities of everything in life, keep your application running the way it is, make the patchs just to avoid your fear of overloading the server and jump to a higher-level solution, "maybe" until a new language if any... the best solution is the one that is best for you and the business.
You may have a stack for each user, with a maximum of x threads.
– Jorge B.