Replace columns of a row when another column starts with a specific word

Asked

Viewed 95 times

1

I need to replace a letter from a column every time it finds a specific word in the file.

Example: In the lines below, whenever you find the word PEDRO in a row in column 4, I have to replace "P" with "E" in column 8. That is, they can come as "PEDRO HENRIQUE" but I need you to detect only the part of PEDRO.

"1234556","123123123","0000021152","PEDRO","20011101",1000,100,"P","",10,"MG",500,"??",0,"R06A0","ABC"
"1234565","123123123","0000004517","ALBERTO","20010401",1000,500,"E","G",1,"% ",400,"ML",1000,"P01B0","DGK"
"12312312","123123","0000005334","CARLOS","20010701",3000,100,"E","",30,"MG",50,"??",0,"N05C0","AAA"
"11236","1245545423","0000021152","PEDRO RICARDO","20011101",1000,100,"P","",10,"MG",1500,"??",0,"R06A0","ABC"
"123123123","123123123","0000011011","RAFAEL","20010901",100,100,"E","",1,"G ",100,"ML",1000,"J01D2","FPB"
"123123123","12312312312","0000018102","RONALDO","20011001",4800,100,"P","",48,"??",0,"??",0,"N02B0","ACA"
"11236","1245545423","0000021152","PEDRO HENRIQUE","20011101",1000,100,"P","",10,"MG",1500,"??",0,"R06A0","ABC"

1 answer

2


Use:

awk -F, '$4=="\"PEDRO\"" {$8="\"E\""}1' OFS=, nomeDoArquivo

Explanation:

First we configure the delimiter to be the comma character, since it is a file of type CSV.

The expression checks whether the fourth column is equal to "PEDRO" and if so, replace the value of the eighth column with "E". It was necessary to escape these quotes, since in the example you gave all the elements are quoted.

The number 1 at the end says that it is to print the line in any way, whether there has been modification, or not.

Result with the given example:

«~» $ awk -F, '$4=="\"PEDRO\"" {$8="\"E\""}1' OFS=, teste
"1234556","123123123","0000021152","PEDRO","20011101",1000,100,"E","",10,"MG",500,"??",0,"R06A0","ABC"
"1234565","123123123","0000004517","ALBERTO","20010401",1000,500,"E","G",1,"% ",400,"ML",1000,"P01B0","DGK"
"12312312","123123","0000005334","CARLOS","20010701",3000,100,"E","",30,"MG",50,"??",0,"N05C0","AAA"
"123123123","123123123","0000011011","RAFAEL","20010901",100,100,"E","",1,"G ",100,"ML",1000,"J01D2","FPB"
"123123123","12312312312","0000018102","RONALDO","20011001",4800,100,"P","",48,"??",0,"??",0,"N02B0","ACA"
"11236","1245545423","0000021152","PEDRO","20011101",1000,100,"E","",10,"MG",1500,"??",0,"R06A0","ABC"

Edit request in comment:

To accept any values in the fourth column starting with PETER, and not exactly equal to PETER, you need to change the condition to a regular expression.

Use:

awk -F, 'match($4, /PEDRO/) {$8="\"E\""}1' OFS=, teste

The match function checks whether the item in the specified column fits the standard provided in the case, starting with "Pedro and ended with ", having any combinations of characters in the middle.

Again, the quotes needed to be escaped, and the expression .* represents any combinations of characters.

  • Thanks for the quick reply! I think I missed a little more detail. My first question rs

  • The file is. txt, and the line that comes from the word pedro may come with more: "1234556","123123123","0000021152","PEDRO Henrique","20011101",1000,100,"P","",10,"MG",500,"?" ,0,"R06A0","ABC" "1234556","123123","0000021152","PEDRO Goncalves","20011101",1000,100,"P","",10,"MG",500,"?" ,0,"R06A0","ABC" "1234556","123123","0000021152","PEDRO 123123123","20011101",1000,100,"P","",10,"MG",500,"?" ,0,"R06A0","ABC".

  • 1

    In this case, of clarification, I think it is better to edit the question, to add more things at the end, and be more readable than in the comments. Also, there is no problem with the file extension itself. The file having the fields separated by comma already characterizes it, theoretically, as a csv file. In such cases putting the extension would only explain the situation.

  • Perfect. I already edited, if you have any idea how to do, I graduate a lot !

  • I edited the answer. Make sure it now works for your needs. If so, don’t forget to mark the answer as accepted, so the question is resolved, and help any site searches.

  • Did not work =( Do not interfere if there are symbols like % or . in column q comes the word peter. " 0002115203","0000052487","0000021152","PEDRO ABCD AB F.C 5.00%MG x 10","20011101",1000,100,"P",""",10,"MG",500,"?" ,0,"R06A0","ABC"

  • I simplified the answer a little, to be more generic. Anyway, it worked with the example you gave. It also works with the example you gave in the last comment. To avoid this kind of confusion, the correct thing is to always include a verifiable example in the question.

  • Thanks again for your help. Then, I realized that when executing in the terminal this command awk -F, 'match($4, /PEDRO/) {$8="""E"""}1' OFS=, text.txt it shows the correct result, but does not edit the file, it is as if it showed only how the result should be.

  • Just add >text.txt at the end of the command, then.

  • Perfect, it worked. Just to clarify to those who have the same doubt. I had to concatenate to another file. Thus: awk -F, 'match($4, /PEDRO/) {$8=""E""}1' OFS=, text.txt > text2.txt

Show 5 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.