How does Youtube know which videos we’ve watched?

Asked

Viewed 2,517 times

4

The question itself already describes my interest, but what I’m really wanting to know is how youtube stores the information saying that we’ve already watched that video. I thought about storage with cookies, but it would be impractical since whenever someone cleaned the history cookies could be deleted and so I wouldn’t know which video you already watched or not. Soon after I thought that Youtube stores this information in a comic book, however I think that would create a huge amount of data that can be considered practically useless. Is there then some third way to store this information?

I would like to take this opportunity to ask another question on the same subject. Youtube Pro doesn’t just click on the link to consider that user-watched video. Would you know then how he performs this Team to establish which video is watched or not? I thought of a kind of chronometer with about 80% of the real time of the video, where the video will only be considered as watched if the user stays that X seconds or minutes on the page of the 80% of the clicked video. But I think there must be some other way to perform such a function, I would like to know if there is some other way and if yes, what would be?

For those who have never seen what I’m talking about, just log on to Youtube to watch a video and right after the video appears with a film on top of it semi-transparent with the following saying "Watched".

  • 8

    "practically useless"? Google basically lives on it!

  • 5

    When you watch a movie, does Youtube have to send the movie? It just needs to know how much it sent from each one to you. Plus, working with a huge amount of data is exactly what Google does best. Every click you give on websites like Youtube and Facebook is tracked in some way and turned into statistics that ultimately serve marketers to sell more.

  • @bfavaretto did not speak in the pejorative feeling when I expressed myself when saying "practically useless", it was only a form of expression. So the answer to my question is that this data is stored in BD?

  • The information must be saved in the DB, or you can export it to a file at the end of a period, but for the author of the video it must be constant. I don’t think it’s cool for you to determine that for the video to be 'watched' it should have to run 80%. In the same way as news statistics read, no matter if you read 1 line or 10 paragraphs, loading content computes an access

  • I believe it’s in the comic book. I didn’t think your "practically useless" was derogatory, just naive, since big companies make money from the data they store about their users. @utluiz explained better what is behind my comment.

  • What matters to Corp. is the trail to draw a profile and maximize sales according to your 'interest'. That’s why what you watch is important'

  • Whatever form this particular information is stored in, it is not permanent and still has a very short lifespan. The first time I "discovered" this feature, I found it very useful because as not all videos appear on the page of a particular channel, we do not always remember which video we stopped watching. That’s when I saw, after a few days, that the "Assisted" appointments were gone and I went looking for thumbnail which I had stopped at.

  • Why the hell is that question labeled php and jquery?!

Show 3 more comments

3 answers

9


According to several sources about Youtube, this mainly uses the Mysql database to store the information. However, at this point, they probably use some alternative Mysql, modified to get more performance.

However, to say that is to be too simplistic. After all, it is true that the amount of data stored is astronomical, not only of the videos but of the users, including the list of views, preferences and usage habits. As far as I looked, the exact size of the database is unknown, but estimated in the petabyte house.

High-performance architectures like this simply cannot rely on a simple database. There is a whole set of technologies distributed on thousands of servers around the world to account for this.

To understand this you need to be somewhat on par with distributed applications, big data and the like.

In a lecture on Scalability with Pyhton on Youtube, one of the development managers counted on several techniques to meet the great demand.

For example, unlike a "common" system where transactions are always used to change the database and changes are automatically available to all users, Youtube waives certain ACID "warranties" in exchange for high availability. So if a user from China adds a comment to a video, it will be logged into China’s servers and it may take several minutes for the data to be replicated on other servers around the world. This is a small price to pay.

Also, another "cheat" is the amount of views that videos have. When a video is being accessed a lot, the code behind Youtube makes an estimate of accesses per minutes and increases that value by estimating and not by the actual number of accesses. This makes it not necessary to have a central table to store each visit and, at the same time, the number is as "real" as possible.

This is evident if you access some newly published viral video. Surely there will be some anomalies in the statistics if you refresh the page in certain time intervals.

Databases of this type are very different from what 99% of developers are used to seeing on a day-to-day basis. To understand them, it is necessary to enter practically another world of studies.

One of them that is growing today is Big Data, which involves most cases where databases with traditional architectures do not account for. It is true that there is a lot of staging when it comes to Big Data, but it is also true that all serious high-tech companies successfully use such solutions. If you want to have a less technical introduction to Big Data, I recently published an article on the subject.

  • I believe they use Webscalesql, based on Mysql: http://tecnoblog.net/154119/facebook-webscalesql-versao-mysql-bases-bigdata/

  • Missing a word in: High architectures ??? like this one...

  • @Brasofilo Corrigo! Thank you!

2

It doesn’t make much sense to use cookies to store videos already watched, because different browsers don’t share the same cookies. What probably happens is the BD storage relative to your youtube channel (as you can see when logging into youtube there is a "folder" on your channel with the videos already watched").
And as for the time needed / percentage required for a video to be watched to count as Viewed... This is classified information that aims to keep malicious software that uses macros to increase the number of views of a given video.

1

Do you think the data relating to the videos you have seen are useless? I do not believe that they are useless, because various information can be taken from there, as what tastes a certain group of people have, what they are most interested in, among other information. I don’t know for sure which data storage medium youtube uses to store this information, but I believe it’s actually in the database. There are many ways to increase the efficiency of requests and queries, caching is one of them. Have you ever imagined how big and organized the structure of such a company is? I don’t have much idea about this. As to whether the video was watched or not, it may be stored locally on the frontend, and if the condition is met, a Rigger will be responsible for communicating with the backend, recording that the video has been watched. Have you noticed that google doesn’t use libraries like jQuery? yes, it always uses javascript (pure) to maintain efficiency. I hope I’ve helped, at least a little.

  • 4

    This reply seems to me to be a very extensive comment.

  • I did some tests now, with 3 videos, and really the basis to know if the video was watched or not is through the amount of bytes received, successfully, the video in which you are watching, ie if your computer received all the video in question.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.