The answers are all right but... what is happening after all?
The git commands may sometimes not be very friendly, here’s an explanation more human.
On the basis of link script provided by @Guilherme:
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD
git filter-branch --index-filter "[command]" HEAD
Here we’re going through revision by instructing git to run our command (which in case deletes the files).
git filter-branch
: Rewrite history, review by revision, as the filters you specify.
--index-filter
: This filter instructs git to apply the command directly in the repository without making a local copy, for git to make a local copy (i.e., make a checkout of the files) use --tree-filter
. The advantage of --index-filter
is that its execution is faster; the downside is that only git commands can be applied.
"[comando]"
: Here is the command git will apply for each repository revision.
HEAD
: Indicates where git should start iterating by applying the command. It can be another specifier, such as an SHA1 or a tag. Just use git rev-list
to see which revisions git will use (this is the command it uses internally).
git rm -Rf --cached --ignore-unmatch [files]
This is the deletion command of the files themselves, executed at each revision passed by filter-branch.
git rm
: Removes the files from index and working copy (in our case, index).
-rf
: -r
of recursive, recursively removes directories. -f
of force, forces the removal of the file even if there are local modifications; it will make no difference in our case, but it also does not hurt :-)
--cached
: Makes the command work only on the Stage.
--ignore-unmatch
: Returns no error (return of command 0
) even if you do not find the files. Important because if the command used by filter-branch has a non-zero return git understands that there was an error and ends the execution.
[arquivos]
: Path of files to be removed.
Done, after running the command the file does not appear in any revision of git history (git log
). So the repository has decreased considerably in size? Not exactly; at least not in your local copy.
When reviewing files by deleting the files, git generated new revisions without the files, and swapping the old ones for new ones. What happens is that when git does this it doesn’t delete the revision completely from the repository, it just dereference (nor is there such a word né :p) these revisions: they continue to exist in the repository, even orphans. You can find out this by listing them as git reflog
or even checking the size of the folder .git
(which will continue to occupy a large space in your case).
To completely remove these revisions from git you should use the last (and forgotten) line of the script we use as an example:
rm -rf .git/refs/original/ && git reflog expire --expire=now --all && git gc --aggressive --prune=now
Come on.
rm -Rf . git/ref/original/
Erasing the backup done by filter-branch.
rm -rf
: Shell command itself. Again -r
of recursive, to delete subdirectories; -f
of force, ignores non-existent files and makes no prompt with the user.
.git/refs/original/
: This is the folder with the backup of references affected by the command filter-branch.
git reflog expire --expire=now --all
Dereferencing for once the orphaned git revisions.
The reflog is, in my opinion, one of the most poorly documented (and confused) commands in all of git.
git reflog
: Similar to the command git log
(contraction of Reference log), but also covering orphan reviews, stashes, etc..
expire
: Expires revisions (removes your references from reflog, unaware of them by git).
--expire=now
: Consider revisions from a date; now
to apply in all independent of time.
--all
: Causes the reflog be more comprehensive, passing on other branches and stashes.
git gc --aggressive --Prune=now
And finally erasing them.
git gc
: git Garbage Collector. Remove files and compress objects from git
--aggressive
: Causes git to perform optimization even if the command takes a while to execute.
--prune=now
: Consider objects from a date. now
to apply at all independent of time.
Remember, you rewrote your history
To apply your local history, now modified, to your remote you will have to force your push:
git push -f
Consider the effects of this rewrite for those who are also working on the same remote as you.
p s..
- There is an adaptation of mine in the last line of the script that clears the historical repository. The difference is the inclusion of the parameter
--expire=now
and --prune=now
. If you don’t use these parameters git takes a default time of 90 days and 2 weeks respectively, so it only works in your older revisions.
- Github also has very similar tutorial.
related: https://answall.com/questions/485278/como-remover-um-arquivo-do-git-mas-o-manter-locally
– Lucas