5
I need a command that overwrites a specific pattern on each line of a file as many times as necessary until the pattern is no longer found.
For example, in a CSV file, fields are separated by a semicolon ;.
Null fields have no character, as in the following file representing a contact list with 3 records:
Nome;Sobrenome;Telefone1;Telefone2;Email
Joao;Silva;9999-8888;9292-9292;[email protected]
Maria;Souza;8899-0011;;[email protected]
Carlos;Oliveira;;;
The first line is the file header. The contact Maria Souza owns the Telefone2 null and void contact Carlos Oliveira has null the fields Telefone1, Telefone2 and Email.
I want to add \N where the field is null.
On Linux, I use the command:
$ sed -e 's/;;/;\\N;/g' -e 's/;$/;\\N/' arquivo.csv > novo-arquivo.csv
The result is satisfactory for the record Maria Souza, but not for the Carlos Oliveira, because by finding the first pattern ;; and performing the substitution (Carlos;Oliveira;\N;;) it does not consider the substitute text in the continuation of the research and already passes to the next standard, which is the ;$, leaving the result this way:
Carlos;Oliveira;\N;;\N
Remaining a null field yet.
I would like a solution for both Unix and Windows.
I don’t think it’s a good idea to treat a CSV file with a regular expression, but, as you probably already know this and should simply be processing data to provide to another program, I’ll let it go ;)
– motobói