Handle large files with GIT

Asked

Viewed 1,282 times

13

Setting

Some time ago, I tried to use GIT to restore some backups, mostly small files, git behaved very well by versioning them when the changes from one commit to another weren’t big, but on a specific server, there were large binary files, which GIT could not handle, I couldn’t even make the initial commit.


Problem

If git did not behave well with these files (errors were related to memory problems), the real limitations when handling binaries with GIT are open, of course handling binaries is not the purpose of GIT, but the information I obtained at that time was not clear enough.


Question

  1. What is the relationship between the limit of a binary file to be checked in GIT with processing capacity and machine memory?
  2. It’s safe to keep binaries in GIT, even when small versioned in many commits?
  3. What method can we use to optimize GIT so that it behaves better when binary conversion cannot be avoided?

You can cite solutions like Git Annex or Git Bup, but just as a help to the answer, it refers to pure GIT behavior, without plugins or Forks

  • When you talk in very large files you’re talking about what? Megas, Gigas, tens of Gigas?

  • Too big = Gigs ( ~ 15+)

  • 1

    Can you paste here the memory failure message you are having? The error may be in several parts of the process and the resolution is different for each one.

2 answers

13


The primary reason git doesn’t support very large files is that it passes the files through xdelta, which usually means that it tries to load all file contents in memory at once.

If it didn’t do so, you would have to store all the content of each revision of each file, even when you changed only a few bytes of that file. This would be terribly inefficient in the aspect of disk usage, and git is known for its extremely efficient repository format.

You can try to mess with these server parameters:

[core]
  packedGitLimit = 128m
  packedGitWindowSize = 128m

[pack]
  deltaCacheSize = 128m
  packSizeLimit = 128m
  windowMemory = 128m

I think that git-Annex and these kinds of solutions really are the best because of the way git is built. There’s a way around these issues, but you’ll have an extremely custom git server and it wouldn’t work "right away" in other environments if you need to migrate the server.

  • at the time was more or less what I noticed even if totally disabling the compression, apparently the file size limit was directly related to the available memory size

8

Git has a great difficulty with large files (>50MB) and a great loss of resources with large repositories (>10GB).

1) If you are running your own git, you will have to set a maximum size for the archive files. On github, the maximum file size is 100MB. But with 50MB it already gives you a warning.

2) Git is not meant to convert binary files. It’s best to use rsync and copy elsewhere.

3) There is a solution called git-Annex to manage large files. Take a look at http://git-annex.branchable.com/

  • There are those who advocate having an SVN installation in addition to git, just to place binaries. If versioning is important, it may be better than in the file system.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.