Displaying massive amount of data

Asked

Viewed 141 times

4

From the moment we work with a very large amount of data(eg: more than 3 million records), and we need to display this data on the screen while the user uses the page, always having the best possible performance.

  • Splitting this data list into smaller groups and using threads and handlers to display these smaller groups is valid?

  • What are the best practices for display?

  • What are the main factors affecting display performance and why?

  • How important the modeling of this database is in relation to programming in this case?

2 answers

4


In my experience, I list some practices that help in this regard. I will quote them trying to answer in the order of their questions:

  • Use of Pagination:
    • It is certainly a good practice, mainly because of the issue of reducing data traffic. Also consider the correct index creation by which information can be paged (or "sliced"). Here also fits a little "psychology", because it is always interesting that the user finds his information as soon as possible. Both this is better for him, and for the application that will need to work less. See Google, whose greatest asset is to prevent the vast majority of searches from navigating to the second page.
  • Query’s lean:
    • For volumes at this level, the ideal is to seek and transmit strictly from the data source only the information that will be presented. This may not give such a large impact on the server side, but certainly - in the case of a Web application - will impact the size of the data traffic. If it is a server with high volume of concurrent connections, these savings make different with scale growth;
  • Grouping of Information
    • Occasionally you will need to present a summary to the user of these millions of rows of information, such as totals, pending tasks, last releases, etc. For these cases, I recommend creating server processes that will periodically pre-process these summaries. An example could be creation Materialized (Indexed) Views in the database, or an own scheduled process that creates these summary tables;
  • Cache
    • If much of the information presented is repeated in consecutive uses (Ex: Home page that always presents the same items in promotion), you can use framework’s that reduce repetitive queries to the database. As an example on . NET, I quote Nhibernate, which has cache for query’s;
  • Structure of the Database
    • Data modeling is certainly important, because as it is structured it can increase or reduce the amount of work that the database has to do. But not only that, the way the database is set up can help a lot. An example of good practice, in the banks that support this feature, is to separate the physical location (disk) of recording from tables and indexes, so that both can work in parallel without being affected. A few hours of consulting for a DBA can help a lot in this;
  • Distributed Databases
    • Here we start to complicate a little, but depending on the volume of accesses and amount of information can be an interesting output. In databases Nonsql this is a little easier, by creating "slices" (Shards) which can be processed in parallel on separate servers.

Finally, it is worth remembering that maintaining a good performance is a cyclical work. The needs (problems) change, and their solutions too. See this website (in English) with examples of techniques adopted by the Youtube to maintain the scalability of a site that receives more than 1 billion views/day.

Considering that if one starts from good practices, the first step is to identify the biggest bottleneck, and work on it. Solved, the biggest bottleneck becomes another.

And so forever follows our journey... :)

0

Splitting this data list into smaller groups and using threads and handlers to display these smaller groups is valid?

Yes.

What are the best practices for display?

Do you really need to render the $3 million? I believe that no matter what you do, the performance will not be the best precisely because of the amount of elements. I would work with a smaller set.

What are the main factors affecting display performance and why?

What affects performance is just how many elements the browser needs to render.

How important the modeling of this database is in relation to programming in this case?

The database influences only to obtain this data. 3 million is a relatively high amount, so it is worth checking if there are the proper indexes, if the disk and the data are not fragmented.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.