Let’s understand why it looks like git "displays a number of changes per file greater than the total number of changes".
Let’s assume I have a 3-line file:
primeira linha do arquivo
segunda linha do arquivo
terceira linha do arquivo
Let’s also assume that this is the content that’s in the last commit (in my case, it only has 1 commit):
$ git log --oneline
792eda8 primeiro commit
Then I edit the file by changing the second line and adding a fourth line:
primeira linha do arquivo
mudando a segunda linha do arquivo, blablabla etc
terceira linha do arquivo
adicionar quarta linha
And commit to it:
$ git add arq.txt
warning: LF will be replaced by CRLF in arq.txt.
$ git commit -m "mudar arquivo"
[master d0be287] mudar arquivo
1 file changed, 2 insertions(+), 1 deletion(-)
$ git log --oneline
d0be287 mudar arquivo
792eda8 primeiro commit
If we use git log
to see the changes, we will have 2 insertions and 1 deletion:
$ git log --stat d0be287 -1
commit d0be287ddf28aa910d8fa9d002a609aa8056e357
Author: Fulano de Tal <[email protected]>
Date: 2018-09-12 09:07:34
mudar arquivo
arq.txt | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
And if we use git diff
, we can see in more detail the changes:
$ git diff 792eda8 d0be287
diff --git a/arq.txt b/arq.txt
index 7e12196..757a5bd 100644
--- a/arq.txt
+++ b/arq.txt
@@ -1,3 +1,4 @@
primeira linha do arquivo
-segunda linha do arquivo
+mudando a segunda linha do arquivo, blablabla etc
terceira linha do arquivo
+adicionar quarta linha
In this case, I’m seeing the difference between the commits 792eda8
and d0be287
.
Now notice how the change in the second line is shown. It shows that the snippet segunda linha do arquivo
was "removed" (is with -
in front), giving way to mudando a segunda linha do arquivo, blablabla etc
(which was "added" because it has +
in front).
So although the change was "edit a line", git considers that an insertion and deletion have been made.
The fourth line appears as added (with +
in front). So the result is 2 inserts and 1 deletion. You can have a summary of this with the options --stat
, --shortstat
and --numstat
, showing the same information in different formats:
$ git diff 792eda8 d0be287 --stat
arq.txt | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
$ git diff 792eda8 d0be287 --shortstat
1 file changed, 2 insertions(+), 1 deletion(-)
$ git diff 792eda8 d0be287 --numstat
2 1 arq.txt
--numstat
is the most "compact" option, showing only the numbers and the file name (in the above case are 2 insertions and 1 deletion), which is a good format to be read by scripts, for example.
But if you don’t want a change on the same line to be counted twice (as an insert and deletion) and would like to know only if a row has been modified, another option is to use --word-diff
, that shows the differences taking into account the words (and "words" are delimited by spaces):
$ git diff 792eda8 d0be287 --word-diff
diff --git a/arq.txt b/arq.txt
index 7e12196..757a5bd 100644
--- a/arq.txt
+++ b/arq.txt
@@ -1,3 +1,4 @@
primeira linha do arquivo
{+mudando a+} segunda linha do [-arquivo-]{+arquivo, blablabla etc+}
terceira linha do arquivo
{+adicionar quarta linha+}
Note that now the second line changes are all shown in a single line, showing the words that have been added and removed.
You can still use --word-diff-regex
to define a regular expression that defines what a word is. In your case, you could use ^.*$
, meaning "zero or more characters (.*
) from the beginning (^
) at the end ($
) line" - that is, the whole line is considered a single word. The result is:
$ git diff 792eda8 d0be287 --word-diff-regex="^.*$"
diff --git a/arq.txt b/arq.txt
index 7e12196..757a5bd 100644
--- a/arq.txt
+++ b/arq.txt
@@ -1,3 +1,4 @@
primeira linha do arquivo
[-segunda linha do arquivo-]{+mudando a segunda linha do arquivo, blablabla etc+}
terceira linha do arquivo
{+adicionar quarta linha+}
The second line keeps appearing as if it had a deletion and an insertion, but at least the information is all in one line. With this, it is possible to use egrep
(if you are using bash) to count only these lines.
$ git diff 792eda8 d0be287 --word-diff-regex="^.*$" | egrep -e "^(\[-|\{\+)" -c
2
In this case, I am considering only the lines that start with [-
or {+
. Since I consider the whole line to be a single word, this ensures that any modified line will start with any of these characters. Then I use the option -c
to return the number of lines (in this case, 2
). If you don’t use the option -c
, the return are the modified lines themselves:
$ git diff 792eda8 d0be287 --word-diff-regex="^.*$" | egrep -e "^(\[-|\{\+)"
[-segunda linha do arquivo-]{+mudando a segunda linha do arquivo, blablabla etc+}
{+adicionar quarta linha+}
It is also possible to change the regex of the egrep
to bring the modifications separately:
egrep -e "^\[-.*-\]\{\+"
- lines that have been modified (start with [-
and also have {+
)
egrep -e "^\[-.*-\]$"
- lines that have been removed (start with [-
and end with -]
)
egrep -e "^\{\+.*\+\}"
- lines that have been added (start with {+
and end with +}
)
Remembering that there may be false positives (if some of the delimiters [-
, {+
, etc are part of the line itself).
You could also use git diff d0be287~
- and in that case d0be287~
means "the commit before d0be287" (for more details about this syntax, see here and here). In this case, it checks the differences between the commit prior to d0be287
and its HEAD
(that is, the branch you are currently).