Use of Uniq

Question

Use of Uniq

Asked 8 years, 4 months ago

Viewed 351 times

1

I have been researching without much success, something that in theory I believe is quite simple, however I did not find the correct command.

I have a LOG file with a lot of information, but certain information repeats, but only in a certain column, and everything that repeats in this column, I wanted it to be deleted, remaining only one. Example:

  6; Mar 21 03:18; 182.69.170.145;  unknown;  <[email protected]>;  Get much more positive aspects out of your work out; HIGH
  3; Mar 21 03:20; 182.69.170.145;  unknown;  <[email protected]>;  Eating healthful is not assisting you lose weight; HIGH
  2; Mar 21 03:18; 182.69.170.145;  unknown;  <[email protected]>;  Asian infused diet program pill makes it was West; MEDIUM
  2; Mar 21 13:50; 201.53.117.127;  unknown;  <[email protected]>;  want to see me?; MEDIUM
  3; Mar 21 12:28; 179.208.77.183;  unknown;  <[email protected]>;  how do you like it here?; HIGH
  3; Mar 21 13:49; 201.53.117.127;  unknown;  <[email protected]>;  Good Evening How are things? I m Yana; HIGH

They realize that the e-mail field repeats, but after it does not, so using SORT with UNIQ would not solve my problem, as soon as they order and eliminate the line exactly equal.

Is there a command, or even these (SORT and UNIQ) with some specific parameter that does this?

Grateful.

1

I do not understand very well what you want, has how to put an example of the data as you want it to stay?

– JuniorNunes

2017/03/26 at 03:54
Using the above data, I need only the column containing the email <Alfredo.xxx> to be used by Uniq, eliminating the repeated ones. Note that the row is all different except the email column, so giving a Uniq in this column will leave only one row in the filter. If I give a Uniq in the file that contains the above data, it will delete the line that is 100% equal, but it will never be totally equal.

– user54154

2017/03/27 at 15:03
You could specify the desired output for this example?

– JJoao

2017/03/27 at 15:31
In the above output we have 6 lines of LOG, what I wanted is the output to have only one line of LOG, because I don’t need 6 lines of the email <Alfredo.xxx@>, if inside a LOG file has a thousand lines, and among these thousand lines, have 100 lines under the e-mail <Alfredo.xxx@>, as UNIQ removes repeaters, wanted you to remove all repeats from the email column, leaving only one. If I set the UNIQ without directing the column, it will delete the line exactly the same, and see that in the 6 lines above, only the email account is equal, the rest is not. I wanted to use the email field as delimiter.

– user54154

2017/03/27 at 17:53
Do you want to group only Alfredo.xxx emails? or do you want to group all repeat emails?

– JuniorNunes

2017/03/27 at 17:55
For the above example, I would like to have only one complete email line, with (date/time IP and etc.) and the rest to be deleted. Because, in LOG I will have 100 thousand lines with several repeated emails, but the rest does not repeat, as IP, Date/ Time and Subject. I wanted that mass of data eliminate only the lines that repeat the emails, remaining only one.

– user54154

2017/03/27 at 17:58
1

@user54154 tries this command then: sort -u -t ';' -k5,5 nome-do-arquivo, here worked well. NOTE: Replace filename with LOG file.

– JuniorNunes

2017/03/27 at 18:16
It worked :), can you explain the parameter? So I know what exactly it does for me to learn? Grateful.

– user54154

2017/03/27 at 18:25
I will put an answer explaining each thing.

– JuniorNunes

2017/03/27 at 18:29

Show 4 more comments

2 answers

1

You can use this command:

sort -u -t ';' -k5,5 nome-do-arquivo

-u (Unique) which causes the equal values to be grouped.
-t ';' Sets the column separator (which in the case of your file is ;).
-k5,5 Sets the number of the column you want to work on (in your case 5, which is email, and only email).

You can read more about the command sort here.

And which parameter eliminates the repeated ones, because the "u" only groups. Thank you very much.

– user54154

2017/03/27 at 18:45
The own -u that does the removal of the repeated. I just tested here. Thank you so much for your help!

– user54154

2017/03/27 at 18:49
Exactly @user54154 when it groups it eliminates everything.

– JuniorNunes

2017/03/27 at 18:54

Browser other questions tagged linux regex shell-script

You are not signed in. Login or sign up in order to post.

by JJoao • **5,113** points · Answer 1 · 2017-03-27T21:06:47+00:00

By the way, if it’s important to keep the order of the original file, can:

awk -F';' '++n[$5] == 1' nome

-F ';' -- sets the Fieldseparator (field separator)
n[$5] -- counts the number of occurrences of each field value 5 (email) The vector n has string type indexes (associative array)
++n[$5] -- increments the value corresponding to the specific email
++n[$5] == 1 -- first occurrence of this email (default action: print)