Best way to structure history table with large amount of data

Asked

Viewed 842 times

3

Functional requirement

I own devices. Each device roughly has its unique identifier, an IP address, and a type.

I have a routine of pings for every device that has an ip address. This routine is nothing more than a C#executable, which runs every 3 minutes and tries to drip such IP address from the device.

The result of ping i need to store in the database as well as the date of the check (independent of the result of the ping).

Technical Part:

Supposing that my process of ping and bank structuring be ready from the day 01/06/2016, I need to do two things:

  • Daily extraction (which is a fed table)
  • Real-time extraction (last 24 hours)

Both must return to me the same thing:

  • Devices that have been unavailable for more than 24 hours.
  • Devices that have been unavailable for more than 7 days.

Unavailable means the device which, when dripping, did not respond to the ping.

Available means the device which, upon being dripped, ping successfully.

What I have today and works very badly:

A table historico_disponibilidade, with the following structure:

create table historico_disponibilidade (id_dispositivo number, respondeu number, data date);

This table has a large amount of data (today it has 60 million, but the trend is to grow exponentially)

The questions:

  • How to achieve such goals without bumping into problems of slowness in queries?
  • How to create a table(s) structure that is prepared to receive millions/billions of records within my corporate universe?
  • I worked on a system where a single table had 40gigas of data (BD Oracle), the Selects on it was as fast as a smaller table, because it used the correct indexes. The structure of this table is good, you did not put the indexes, but if the id_device is the key and you want to get a range of dates there then you will have performance problems. If in your environment you can simulate such a large table in homologation is worth it for you to have the experience of how it worked.

  • @Ricardo, where do you think the index should exist? Yes, I need to pick up by a range of dates and one by id_device as well.

  • So, in the log tables that I used we only had an index on the date of the record, nothing else, the index is a new table with that column, it takes space, and there are many, many tables. The selects in the log were always run with a date to use the index. In maintenance was run about 10 logs a day, it was not much, I think it will depend on how much space you can have and how many queries you saw want, I think an index on the date would solve in your case. This data will be displayed on screen for user?

1 answer

2


The problem does not seem to be exponential or polynomial. There are some that are, there would not be much to do. If there is an exponential or polynomial implementation, you have to solve it, but the question does not make it clear how it is being done, it seems to me that it is not even the case.

The volume of rows in a table should affect very little the performance of darlings, as long as it has the necessary indexes and the queries themselves do not do very weird things.

Even if it does, the solution would be to erase the data you no longer need. If you need to keep the data for historical purposes only and no longer need to keep queries in the old data (at least not often) the solution is to have an auxiliary table that would function as a dead file. There would be a transfer from the normal table to the dead file. Apparently you only need to keep the last 7 days. I don’t think I need to say that this transfer can be automated. It is possible to use ranges of data to partition the tables, but I don’t know how this works on Oracle.

All the good databases in the market, especially Oracle, are prepared to work with billions of lines without problems. If you have a specific problem in a query and you’re not finding a solution (probably with content) so post a specific question.

Note that the table is so simple that it has nothing to simplify. Unless you have hidden things in the question.

I wonder if the whole routine is wrong. It seems strange to me to do what is being done, I think is the right solution to the wrong problem, but I will not get into the merit.

  • I also thought about expunging these data, big... but so, assuming the same structure is maintained, you think that with a correct indexing the problem is "solved"?

  • You don’t say you have any problem, at least you don’t specify it. If you have, the correct indexing (suitable for query which must also be correct) must solve. Without indexing it is possible that you have to search billions of lines, with the index, the search will be limited to 1 or 2 dozens of lines. It’s brutal like this. Try searching for a word in a dictionary one by one, or considering its natural inaccuracy. http://answall.com/q/35088/101, http://answall.com/q/55118/101, http://answall.com/q/32052/101

  • 1

    Thought of something like , only record if the previous ping is different , if it was ok and the previous one was error or vice versa, so I don’t need to have 20 x 24 pings per day if the ip stayed on the air , could have a check also the first ping of the day be recorded just to check if the automatic program is running, I see no sense in information redundancy.

  • @Motta, in fact, if he never needed all day information, he could do it to save space. I don’t know if I wouldn’t need some more analytical report, but if I did and I didn’t have the information, I blew it. Then it’s up to him whether he can count on it or not.

  • @bigown , the problem seems to know the time that ip "got out", the only danger is the ping program for for some reason is to pass blank , so maybe record a first operation of the day (if ping situation or date differ from last recorded >> records otherwise >> does not record) , exchange 480 day appointments for one in the normal situation , ip in the air.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.