Best option for single data (IP, Cookies, MAC)

Asked

Viewed 501 times

1

This question may be a little generic, depending on your point of view, but I will try to be as specific as possible.

I have a system that adds 1 (+1) in the visits, when accessed. Example:

<?php
$mysql->query('UPDATE post SET views = views + 1 WHERE id = 10');
?>

I think you understand. The problem in this case is that each F5 will update the visit number even if it is from the same user.

So I thought of some solutions:

  1. Save IP to database and compare before inserting
  2. Save a cookie to the browser and check before entering.
  3. Save the MAC in the database and compare before inserting.

Which of these solutions are better? Taking into account the performance and obviously the difficulty of changing (delete cache/cookies or use anonymous browser breaks the second case, as well as reset the modem or use proxy breaks the first case, as well as use some other proxies or tor breaks the third case).

Using all of them would be a good alternative?

I saw that to get the MAC via PHP is almost impossible, however the Cloudflare manages to get that data, so I listed him, although no know how to actually do.

  • By MAC would be the best option...however I believe it is very complicated to get, best option would be to use cookie + IP

  • Cloudflare does not get any MAC, just like the browser does not have access to it. There are other ways to make identification, but usually a mixture of techniques is used. Error always has, but you have to know which techniques to miss "for more or for less" and will decide this depending on the goals of the count.

  • Well, I don’t know what you get, but it had already been banned by Cloudflare in a test. Even changing the IP, changing the browser, erasing all caches/cookies still remained blocked. Using proxy (yes, both at HTTP and HTTPS) continued, using Tor Browser the lock did not exist. In cellular (3G) worked, in my network did not work, even with another IP. Maybe they blocked an entire block of IP, but I find it difficult, because it would reach many users. In a conversation I had with an exchange administrator, who used Cloudflare, asked me to change the MAC Address, which worked.

1 answer

3


In the question does not specify the environment, so I will consider that you want to use in the open web environment. In this case, as you do not have administrative access to the client’s browser you will not be able to get the mac address.

Viable forms are a combination of cookies and IP if you cannot control by authentication (logged-in user).

The combination of cookie and IP should be well planned, for example, a free wifi provides the same IP for thousands of users. In a shopping mall, in a park, train and subway stations, buses, etc.

In a residential or commercial building we can also have the same shared IP situation. Therefore, one should not think that the IP identifies as a single user. But it can be used to determine if that user has returned and visited the page again. To do this we combine the cookie data.

Cookies

Cookies are the most secure means possible and standardized because this same user under the same cookie, can access again by another IP. Hence the importance in not relying on a single identifier. Also capture browser information like browser name and version. Two distinct users under the same IP often have different browser versions. So, even if both delete cookies, it will be possible to filter later to search for possible duplicates.

Filtering algorithms

Systems like Youtube, for example, use such algorithms to determine the amount of views. So it is common to see videos, which have 100 thousand views suddenly lose and get on 80 thousand views overnight. This is due to the fact that there is a later filtering. Many are unaware to think that Youtube is "stealing" views. But actually what happens is a later filtering because the monetization system cannot "steal" from the advertiser either. Youtube was merely an example for being a very popular service in order to set a practical example. Google Analytics and Google adsenses does the same, but much more advanced.

Fingerprint

Continuing the subject, you will probably hear the obvious that it is easy to remove or edit cookies. However, a common user does not do this type of action. They are usually users with a specific purpose in doing so or merely a legitimate user who has simply deleted the browser cache. There are also cases of users with multiple devices. Access by smartphone, tablet and PC. Officially you are a single person, but you have accessed different devices. At this point comes the logic of your business model. Your business model is who will define how to deal from now on. From the customer data, generate a kind of fingerprint, for example.

Authentication and bots

A more secure way is to identify the person by authentication/login. Use it whenever you can. Make the user identify. This makes filtering work much easier.

Bots!

Of course you should also be aware of bots and set rules on how to treat them.

Evercookie!

Optionally there is the controversial use of evercookie. I recommend not to use such practice. But it is interesting to know that there is such "option".

Refresh, time of navigation

Obviously you should also create some logic to identify the browsing time and permanence on the page, such as identifying a "refresh" on the page. This will help identify how the number of views increased. So don’t save only +1 in the views. Save the entire log about how that view was generated.

Usually algorithms are complex and it is not feasible to process them at runtime. Just let the logs be saved and in a private environment run the filtering. The downside is the accumulation of data. Easily the database will reach 1GB easily in a week, depending on the volume of accesses. It’s hard to deal with large amounts of data when you don’t have dedicated staff who specialize in it. And most companies do not have this "luxury" due to the costs.

Reinvent the wheel. Cost x Benefit.

Finally, it all depends on the business model. Some may find such implementation exaggerated and it really is for small and unimportant systems. Usually poorly made and amateur systems. But for high-level projects it is good to create consistent rules.

In the end, what you will be able to develop is something similar to what already exists, Google Analytics. That is why many choose to leave this to the care of third-party services such as Google Analytics, which obviously has great know-how.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.