File Additions and Deletions in a commit

Asked

Viewed 100 times

1

I’m using the remote git log --stat <commit> to get additions and deletions made to a commit.

Ex:

commit 1a1a
Author: Gabriel Hardoim
Date: 2018-08-20 20:30:40

arquivo.java    | 3 +++
arquivo.css     | 3 ---

2 files changed, 3 insertions(+), 3 deletions(-)

So far so good, but in cases where in the same file rows were added and removed it displays a number of changes per file greater than the total number of changes.

Ex:

commit 1a1a
Author: Gabriel Hardoim
Date: 2018-08-20 20:30:40

arquivo.java    | 5 +----
arquivo.css     | 5 ++++-

2 files changed, 5 insertions(+), 5 deletions(-)

In such cases:

  • How do I find out how many additions/deletions have been made to each file?
  • Is there a command or flag that can help me in this sense?

Ps: When the number of changes is too large, the amount of + and - next to the file does not make it as clear as in the case of small changes.

1 answer

2


Let’s understand why it looks like git "displays a number of changes per file greater than the total number of changes".

Let’s assume I have a 3-line file:

primeira linha do arquivo
segunda linha do arquivo
terceira linha do arquivo

Let’s also assume that this is the content that’s in the last commit (in my case, it only has 1 commit):

$  git log --oneline
792eda8 primeiro commit

Then I edit the file by changing the second line and adding a fourth line:

primeira linha do arquivo
mudando a segunda linha do arquivo, blablabla etc
terceira linha do arquivo
adicionar quarta linha

And commit to it:

$ git add arq.txt
warning: LF will be replaced by CRLF in arq.txt.

$ git commit -m "mudar arquivo"
[master d0be287] mudar arquivo
1 file changed, 2 insertions(+), 1 deletion(-)

$ git log --oneline
d0be287 mudar arquivo
792eda8 primeiro commit

If we use git log to see the changes, we will have 2 insertions and 1 deletion:

$ git log --stat d0be287 -1
commit d0be287ddf28aa910d8fa9d002a609aa8056e357
Author: Fulano de Tal <[email protected]>
Date:   2018-09-12 09:07:34

    mudar arquivo

 arq.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

And if we use git diff, we can see in more detail the changes:

$ git diff 792eda8 d0be287
diff --git a/arq.txt b/arq.txt
index 7e12196..757a5bd 100644
--- a/arq.txt
+++ b/arq.txt
@@ -1,3 +1,4 @@
 primeira linha do arquivo
-segunda linha do arquivo
+mudando a segunda linha do arquivo, blablabla etc
 terceira linha do arquivo
+adicionar quarta linha

In this case, I’m seeing the difference between the commits 792eda8 and d0be287.

Now notice how the change in the second line is shown. It shows that the snippet segunda linha do arquivo was "removed" (is with - in front), giving way to mudando a segunda linha do arquivo, blablabla etc (which was "added" because it has + in front).

So although the change was "edit a line", git considers that an insertion and deletion have been made.

The fourth line appears as added (with + in front). So the result is 2 inserts and 1 deletion. You can have a summary of this with the options --stat, --shortstat and --numstat, showing the same information in different formats:

$ git diff 792eda8 d0be287 --stat
 arq.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

$ git diff 792eda8 d0be287 --shortstat
 1 file changed, 2 insertions(+), 1 deletion(-)

$ git diff 792eda8 d0be287 --numstat
2       1       arq.txt

--numstat is the most "compact" option, showing only the numbers and the file name (in the above case are 2 insertions and 1 deletion), which is a good format to be read by scripts, for example.


But if you don’t want a change on the same line to be counted twice (as an insert and deletion) and would like to know only if a row has been modified, another option is to use --word-diff, that shows the differences taking into account the words (and "words" are delimited by spaces):

$ git diff 792eda8 d0be287 --word-diff
diff --git a/arq.txt b/arq.txt
index 7e12196..757a5bd 100644
--- a/arq.txt
+++ b/arq.txt
@@ -1,3 +1,4 @@
primeira linha do arquivo
{+mudando a+} segunda linha do [-arquivo-]{+arquivo, blablabla etc+}
terceira linha do arquivo
{+adicionar quarta linha+}

Note that now the second line changes are all shown in a single line, showing the words that have been added and removed.

You can still use --word-diff-regex to define a regular expression that defines what a word is. In your case, you could use ^.*$, meaning "zero or more characters (.*) from the beginning (^) at the end ($) line" - that is, the whole line is considered a single word. The result is:

$ git diff 792eda8 d0be287 --word-diff-regex="^.*$"
diff --git a/arq.txt b/arq.txt
index 7e12196..757a5bd 100644
--- a/arq.txt
+++ b/arq.txt
@@ -1,3 +1,4 @@
primeira linha do arquivo
[-segunda linha do arquivo-]{+mudando a segunda linha do arquivo, blablabla etc+}
terceira linha do arquivo
{+adicionar quarta linha+}

The second line keeps appearing as if it had a deletion and an insertion, but at least the information is all in one line. With this, it is possible to use egrep (if you are using bash) to count only these lines.

$ git diff 792eda8 d0be287 --word-diff-regex="^.*$" | egrep -e "^(\[-|\{\+)" -c
2

In this case, I am considering only the lines that start with [- or {+. Since I consider the whole line to be a single word, this ensures that any modified line will start with any of these characters. Then I use the option -c to return the number of lines (in this case, 2). If you don’t use the option -c, the return are the modified lines themselves:

$ git diff 792eda8 d0be287 --word-diff-regex="^.*$" | egrep -e "^(\[-|\{\+)"
[-segunda linha do arquivo-]{+mudando a segunda linha do arquivo, blablabla etc+}
{+adicionar quarta linha+}

It is also possible to change the regex of the egrep to bring the modifications separately:

  • egrep -e "^\[-.*-\]\{\+" - lines that have been modified (start with [- and also have {+)
  • egrep -e "^\[-.*-\]$" - lines that have been removed (start with [- and end with -])
  • egrep -e "^\{\+.*\+\}" - lines that have been added (start with {+ and end with +})

Remembering that there may be false positives (if some of the delimiters [-, {+, etc are part of the line itself).


You could also use git diff d0be287~ - and in that case d0be287~ means "the commit before d0be287" (for more details about this syntax, see here and here). In this case, it checks the differences between the commit prior to d0be287 and its HEAD (that is, the branch you are currently).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.