1
I have been researching without much success, something that in theory I believe is quite simple, however I did not find the correct command.
I have a LOG file with a lot of information, but certain information repeats, but only in a certain column, and everything that repeats in this column, I wanted it to be deleted, remaining only one. Example:
6; Mar 21 03:18; 182.69.170.145; unknown; <[email protected]>; Get much more positive aspects out of your work out; HIGH
3; Mar 21 03:20; 182.69.170.145; unknown; <[email protected]>; Eating healthful is not assisting you lose weight; HIGH
2; Mar 21 03:18; 182.69.170.145; unknown; <[email protected]>; Asian infused diet program pill makes it was West; MEDIUM
2; Mar 21 13:50; 201.53.117.127; unknown; <[email protected]>; want to see me?; MEDIUM
3; Mar 21 12:28; 179.208.77.183; unknown; <[email protected]>; how do you like it here?; HIGH
3; Mar 21 13:49; 201.53.117.127; unknown; <[email protected]>; Good Evening How are things? I m Yana; HIGH
They realize that the e-mail field repeats, but after it does not, so using SORT with UNIQ would not solve my problem, as soon as they order and eliminate the line exactly equal.
Is there a command, or even these (SORT and UNIQ) with some specific parameter that does this?
Grateful.
I do not understand very well what you want, has how to put an example of the data as you want it to stay?
– JuniorNunes
Using the above data, I need only the column containing the email <Alfredo.xxx> to be used by Uniq, eliminating the repeated ones. Note that the row is all different except the email column, so giving a Uniq in this column will leave only one row in the filter. If I give a Uniq in the file that contains the above data, it will delete the line that is 100% equal, but it will never be totally equal.
– user54154
You could specify the desired output for this example?
– JJoao
In the above output we have 6 lines of LOG, what I wanted is the output to have only one line of LOG, because I don’t need 6 lines of the email <Alfredo.xxx@>, if inside a LOG file has a thousand lines, and among these thousand lines, have 100 lines under the e-mail <Alfredo.xxx@>, as UNIQ removes repeaters, wanted you to remove all repeats from the email column, leaving only one. If I set the UNIQ without directing the column, it will delete the line exactly the same, and see that in the 6 lines above, only the email account is equal, the rest is not. I wanted to use the email field as delimiter.
– user54154
Do you want to group only Alfredo.xxx emails? or do you want to group all repeat emails?
– JuniorNunes
For the above example, I would like to have only one complete email line, with (date/time IP and etc.) and the rest to be deleted. Because, in LOG I will have 100 thousand lines with several repeated emails, but the rest does not repeat, as IP, Date/ Time and Subject. I wanted that mass of data eliminate only the lines that repeat the emails, remaining only one.
– user54154
@user54154 tries this command then:
sort -u -t ';' -k5,5 nome-do-arquivo
, here worked well. NOTE: Replace filename with LOG file.– JuniorNunes
It worked :), can you explain the parameter? So I know what exactly it does for me to learn? Grateful.
– user54154
I will put an answer explaining each thing.
– JuniorNunes