What are Git mod sets really?

Asked

Viewed 215 times

6

I’m starting now to use Git and one thing I heard in the course I’m looking at is this: what Git actually stores is not the different versions of the files, but rather sets of modifications.

That way, by making a commit, instead of saving the files in the repository state they are in, what I understood from this is that Git saves in the repository a set of modifications containing the modifications that have been made.

My doubt

The modifications are relative to what state of the project? The initial state or the state of the last commit?

What I mean is basically the following: suppose right after we create the repository we add a file arquivo1.txt, when we commit, the modification that Git will record is creating that file. After that, we create a new file arquivo2.txt and add a line to the arquivo1.txt and commit.

In this case what does Git save in the second commit? The modification recorded is the addition of arquivo2.txt and adding the line (i.e., relating to the last modification) or adding the file arquivo1.txt and of arquivo2.txt as well as the modification in the first file (i.e., the total modification from the initial state)?

It seems to me that what Git saves is actually the modification relative to the last modification, since if it saved the modification relative to the initial state it would be equivalent to saving the different versions of the files themselves.

I’m really not really understanding yet what these mod sets Git records in each commit are. What they really are?

  • 3

    That’s what you’re really thinking, it saves on the last commit.

  • Because the second option (save both the new full version of the file and the modifications in this file) would be a redundant record.

  • http://git-scm.com/book/en/v2/Getting-Started-About-Version-Control

  • this in English http://git-scm.com/book/pt-br/v1

  • Thanks @bfavaretto, I just didn’t get one thing. This link that Juarez passed says that Git actually saves snapshots, while other Vcs save modifications to the files. If Git saves snapshots, then in reality it has in each commit the exact state of each file and not just the right modifications? I guess I don’t quite understand how this idea of snapshots relates to what I saw in the course and commented on the question.

  • 2

    No, it only has the modifications, but with them it is able to reproduce the entire tree of your project in a given commit. I don’t know where you read this, or what you were taught as "snapshot" in the course, but bear in mind that this is not a technical term, accurate. It is a metaphor for representing that it is possible to recover a "portrait" of your project in several phases of it.

  • Actually in the course there was no talk of snapshots, only changesets, sets of modifications that are saved at each commit. It was at this link http://git-scm.com/book/en/v2/Getting-Started-About-Version-Control that I saw now talking about snapshots. The idea then is that the repository has the modification sets for each commit and is it possible to get that "picture" by applying the modifications? Thanks again for the help!

  • 2

    @Leonardo Hunm, I begin to fear for this course. It’s nothing serious but I don’t like when something formal trying to teach uses wrong terms. Git does not work with changesets, it does not have that ability. I do not know much the inner workings of these software but know changesets are the modifications made to the grouped files. Snapshots is a state at a given time. In theory it would be an exact copy of the file at that time. But it is possible to work with a delta coding that ends up reproducing the mechanism of differences but by a totally different method.

  • No one answered, I risked an answer.

  • @Mustache I also got a little suspicious of the course now. In addition to the official Git website I also found some videos on MVA (http://www.microsoftvirtualacademy.com/training-courses/using-git-with-visual-studio-2013-jump-start) about Git that confirm the use of snapshots and not deltas.

  • But as far as I know snapshots are obtained by deltas.

  • @Leonardo, I made an answer because things were a little fuzzy around here. I hope I’ve made it clear what changeset, snapshot, how Git stores files at the time of the commit, and how the contents of the file are overwritten by the changes records in the file. If something has not become clear please let us know.

Show 7 more comments

2 answers

3


These terms are not always used in the right way.

Git uses snapshot which is a state of content at a given time, so it is also known as point in time. These snapshots are from the entire repository. Hence there is a limitation in Git for any update to affect the entire repository and some tricks (stash for example*) should be used to prevent that part of the content being manipulated is confirmed in the repository.

So you don’t have to store all the new content in each commit a differential mechanism is used and only one delta coding of each change is stored. These differentials are obtained between the previous version and the current one. Still the repositories exist by themselves. This form allows several lines of development to compete simultaneously.

Other version control software may use changesets or sets of modifications. These are the differentials between what existed in the repository and what is now being confirmed. This subtle difference makes it difficult to work with multiple sources of concurrent changes. But it makes it easier to partially confirm changes. This way you can easily select what you want to confirm and assemble a set of modifications with just what you want at that moment. Repositories are assembled from their changes.

Strictly speaking in Git we can’t make a mod set. Informally this term ends up being used.

I do not know deeply the mechanisms of this software and it is not easy to find definitions that do not generate confusion. Apparently I’m not the only one who hasn’t felt prevented from talking about something he doesn’t know in every detail :P. It’s easy to find conflicting information on the subject. But according to the best known documentation that’s how it works.

  • Thanks for the @bigown reply, I’m already convinced that Git does use snapshots. In addition to what you said, I did some research and looked in the folder. git/Objects, I ran some tests and used ruby to see the contents of the files. From there looking at the blobs it seems that at each commit it saves the entire content of each file, and when it doesn’t change just references what it already had. Now I don’t quite understand what you’re talking about storing only a delta encoding. From what I saw there in the blobs it seems like it saves everything anyway. Do you know how that works in Git? Thank you very much once again, it helped a lot!

  • You made a change to the file, like, you changed a line, a character of its content, and you see how it looks? It’s just supposed to have the differential. I never did what you tried and I don’t have Git here to do but the idea is good to understand how it works, just need to test it in different scenarios. I would be very surprised if he actually saved the whole file because of a small change. Git takes up very little space, even less than software they use changesets. And I’ve read something showing that he does the diff of the changes that is this delta. I just don’t know where has something authoritative.

  • I did just that, I first created a file arquivo1.txt and added some content, then made a commit. Then I edited it and wrote some more content, and did another commit. When I looked at the blobs, in the blob corresponding to the file in the second commit it saved all the content. I found this http://www.git-scm.com/book/en/v2/Git-Internals-Packfiles on the git page, so I understand git can use deltas to take up less space. I think he does it automatically when he needs it.

  • This could be it. He must know when it’s not worth it or can’t for some reason.

1

What are modification sets in Git

"Set of modifications" is to Changeset, that is unrelated to how Git stores changes.

Changeset is a core concept of Git and is also present in several other source code version control systems.

The basic idea of a changeset is to commit a set of changes in an atomic way, that is: either all set changes are successfully committed, or none are. We can make an analogy with a database transaction that either ensures the persistence of multiple records at once, or rollback everything if the persistence of one of the records gives error.

It is a fact that changesets in Git go beyond changesets in other version control systems. In git you can, for example, change a changeset in the repository! That is, in Git you can modify the history of the changes that have taken place. Of course, there are scenarios where this applies and there are restrictions, but this is another story.

How Git stores changes?

Output, it stores in a similar way to many other version control systems: when a changeset arrives, only the changeset files are saved. Files that haven’t changed remain there as they were. Every commit Git logs a snapshot, which is the status of the repository as it was after this commit.

As with other versioning systems, you can request the status of the repository at any point in the past, i.e.: you require a particular snapshot. What Git will give you then are the commited files at the time of registering that snapshot and also all the files that were there before, the ones that weren’t modified by the commit that gave rise to the snapshot.

No, Git doesn’t only save commit changes to each file. At the time of the commit, Git stores the entire contents of the file, not just the modifications made to the file.

It’s correct to say that Git is able to store only the differences between commits of the same file instead of having to keep the entire file even if only a single line has been modified?

Yes, it’s correct. In good times, Git will make a sort of Garbage Collection and, among other things, it will also delete some historical files by replacing them with the record only of the modifications that happened in these files between a commit and another (Delta Encoding). You can also force this process when you wish.

Note that during Garbage Collection, Git does not replace the files in the new commits with its own delta encoding but the reverse: it gets the changes from the file’s most current state back, so as to deliver with more speed the latest version of the file (which is probably the one you will want most of the time).

Completion

Set of modifications or Changeset is a concept that deals with commit atomicity and is not directly related to the way Git stores files. Git is one of many versioning systems that use this changeset concept.

During the commit, Git stores all the contents of the changed file, regardless of whether the file has been modified too little (just a new line, for example).

Git does not need to save a copy of the repository every commit to ensure that the repository is available in some past state. Instead, during the commit it registers a snapshot which points to the newly committed files and also to the current versions of the other files that were already there.

At the right time, Git rearranges its base to save space (Garbage Collection). During this reorganization, past versions of a file can be overwritten by records only of the changes the file has undergone (delta encoding). So, when an older version is requested, Git rebuilds the file from its current version, applying the change records to the older version.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.