Diff changing lines

Asked

Viewed 91 times

3

I’m comparing two files, which are updated daily, with the command diff -y in order to obtain two results:

The first are the lines that have been modified overnight:

grupoAzul;Gabriel;04-maçãs;02-limões       |    grupoAzul;Gabriel;05-maçãs;02-limões
grupoAzul;Amanda;03-maçãs;05-limões             grupoAzul;Amanda;03-maçãs;05-limões

For this, I use the command diff -y arquivoAntigo.csv arquivoNovo.csv | grep -e "|"

The second are the new lines:

grupoAzul;Gabriel;04-maçãs;02-limões       |    grupoAzul;Gabriel;05-maçãs;02-limões
grupoAzul;Amanda;03-maçãs;05-limões             grupoAzul;Amanda;03-maçãs;05-limões
                                           >    grupoAzul;Kratos;04-maçãs;00-limões

For this result the command diff -y arquivoAntigo.csv arquivoNovo.csv | grep -e">" is used.

Explained this, let’s go to error

When a new line appears on top of the modified line, the diff 'pushes' the modified line down and considers it as the new line and the one that was to be the new line it considers as the modified line.

grupoAzul;Gabriel;04-maçãs;02-limões       |    grupoAzul;Kratos;04-maçãs;00-limões
                                           >    grupoAzul;Gabriel;05-maçãs;02-limões
grupoAzul;Amanda;03-maçãs;05-limões             grupoAzul;Amanda;03-maçãs;05-limões

These events are, in fact, rare to happen but when they happen I have more than one impaired line.

What causes this bug and how can I fix it??

1 answer

1


The problem is caused because equal records do not appear on the same line in both files. The diff compares files line by line. In the problem example you showed, line 2 of the file on the left is different from line 2 of the file on the right, so it should even be marked with ">".

To avoid this, use sort so that all matching records appear on the same line in both files:

$ diff -y <(sort arquivoAntigo.csv) <(sort arquivoNovo.csv)
                                          <
grupoAzul;Amanda;03-maçãs;05-limões         grupoAzul;Amanda;03-maçãs;05-limões
grupoAzul;Gabriel;04-maçãs;02-limões      | grupoAzul;Gabriel;05-maçãs;02-limões
                                          > grupoAzul;Kratos;04-maçãs;00-limões

However, as you can see, the whitespace in the first file gets first place in the algorithm of sort, then I suggest also removing the blank lines using the sed:

$ diff -y <(sort arquivoAntigo.csv | sed '/^\s*$/d') <(sort arquivoNovo.csv | sed '/^\s*$/d')
grupoAzul;Amanda;03-maçãs;05-limões         grupoAzul;Amanda;03-maçãs;05-limões
grupoAzul;Gabriel;04-maçãs;02-limões      | grupoAzul;Gabriel;05-maçãs;02-limões
                                          > grupoAzul;Kratos;04-maçãs;00-limões

The regular expression used in sed (/^\s*$/) searches all lines containing zero or more blank characters, such as spaces and tabs, and deletes them with the command d.

In time, the notation <( ... ), in the bash serves to have the command closed between the parentheses be executed previously in a subshell. Therefore, when executing the diff above, the sort ... | sed ... run and return already treated temporary files for comparison via diff.

To see working online in tutorialspoint, with the exception that it seems not possible to create files there, so I had to use variables to "simulate them": http://tpcg.io/aO9pny

  • The table number was just in the example.. I will edit the question to make it clearer.

  • Regardless of the content, the sort by default sorts the files alphabetically. Have you tried applying my suggestion? What was the result?

  • I applied yes, but I had no results. He keeps saying that the new line is modified.

  • @Gabrielhardoim perhaps the blank lines of his arquivoAntigo contain tabs, and in my previous answer the command sed only contemplated the use of spaces. I have changed my test files to the data you showed when editing the question, I have now tested it with tabbulations and it is still working here. You can try again with the command I changed in the edit?

  • It hasn’t worked yet : . The arquivoAntigo has no whitespace. The diff "creates" a blank line when you have a new line in the arquivoNovo. Is there any way to talk to diff display new lines at the end??

  • The diff "creates" the blank line 2 because that is a legitimate difference: line 2 of the first file is different from line 2 of the second file. Hence the need to sort. To have the > at the end of output, use a | sort -k2 -n at the end of your command, this sorts by the second column (the one containing | and >) in numerical order. Apparently signs of larger and smaller have a "number" higher than the pipe...

  • I saw that you accepted the answer =). It worked? I included an example in the tutorialspoint: http://tpcg.io/aO9pny

  • Here unfortunately it didn’t work, but I marked how certain Because can help someone and you offered to explain :D

Show 3 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.