Command to replace characters recursively

Question

Command to replace characters recursively

Asked 11 years, 7 months ago

Viewed 2,176 times

5

I need a command that overwrites a specific pattern on each line of a file as many times as necessary until the pattern is no longer found.

For example, in a CSV file, fields are separated by a semicolon ;.
Null fields have no character, as in the following file representing a contact list with 3 records:

Nome;Sobrenome;Telefone1;Telefone2;Email
Joao;Silva;9999-8888;9292-9292;[email protected]
Maria;Souza;8899-0011;;[email protected]
Carlos;Oliveira;;;

The first line is the file header. The contact Maria Souza owns the Telefone2 null and void contact Carlos Oliveira has null the fields Telefone1, Telefone2 and Email.

I want to add \N where the field is null.

On Linux, I use the command:

$ sed -e 's/;;/;\\N;/g' -e 's/;$/;\\N/' arquivo.csv > novo-arquivo.csv

The result is satisfactory for the record Maria Souza, but not for the Carlos Oliveira, because by finding the first pattern ;; and performing the substitution (Carlos;Oliveira;\N;;) it does not consider the substitute text in the continuation of the research and already passes to the next standard, which is the ;$, leaving the result this way:

Carlos;Oliveira;\N;;\N

Remaining a null field yet.
I would like a solution for both Unix and Windows.

I don’t think it’s a good idea to treat a CSV file with a regular expression, but, as you probably already know this and should simply be processing data to provide to another program, I’ll let it go ;)

– motobói

2014/02/01 at 12:10

3 answers

4

Use perl, which supports look-Ahead:

 perl -p -e 's/;(?=;|$)/;\\N/g' arquivo.csv > novo-arquivo.csv

Incidentally, if you want to make the change within the same file (without having to redirect to another), simply pass the -i option (infidel):

 perl -p -i -e 's/;(?=;|$)/;\\N/g' arquivo.csv

Very interesting @Lias. But this command you posted does not meet the last field of the record if the line ends with ; which means that the last field is null and shall be replaced by ;\N. I can put a new parameter -e, thus?: perl -p -e 's/;(?=;)/;\\N/g' -e 's/;$/;\\N/g' arquivo.csv > novo-arquivo.csv

– ricidleiv

2013/12/16 at 16:19
@ricidleiv truth, just fix, thanks.

– elias

2013/12/16 at 16:21
and in the case, for example, to delete a word recursively in a line, like the word nulo in string nnuloulochavenulnuloo to leave only the plavra chave. In Perl, how could I do that?

– ricidleiv

2013/12/16 at 16:50
Well, in that case I think there would be no way to escape from a loop. Likely the one-Liner looks more beautiful in sed, same: sed ':loop; s/nulo//g; t loop;' -- hard to beat that. =)

– elias

2013/12/16 at 16:57
1

You can simplify the case of the line ending with ; use perl -p -e 's/;(?=;|$)/;\\N/g;'

– rodrigorgs

2013/12/16 at 20:00
@rodrigorgs good, Rodrigo! I’ll incorporate the suggestion! -- although it gets a little harder to read, right?

– elias

2013/12/16 at 22:44

Show 1 more comment

Browser other questions tagged regex recursion shell

You are not signed in. Login or sign up in order to post.

by utluiz • **72,075** points · Answer 1 · 2013-12-30T12:29:34+00:00

I’m used to Java development environments, both on Linux and Windows, so I would use a task Ant to perform cross-platform file handling operations like this.

Ant is a powerful and versatile tool used for automation, builds (compilation and package assembly) and file processing. It is important to note that Ant is not a programming language, as some think, but it is a form of declaration of activities (tasks) to be implemented.

Installing the Ant

Download the binary package here, unpack it into a folder and add it to the PATH of your operating system.

Example in Windows:

set path=%path%;c:\caminho\apache-ant-1.9.3\bin

Writing the Ant Build

The following Ant project replaces lines in a given file:

<project name="MyProject" default="replace" basedir=".">
    <target name="replace">
        <replaceregexp
                file="${file}"
                byline="true"
                match=";(?=;|$)"
                replace=";\\\\N"
                flags="gs" />
    </target>
</project>

Running the Project

Ant automatically searches for a file called build.xml in the current directory. So, if file.txt is the file to be processed, the following command will perform the overwriting:

ant -Dfile=file.txt

If the Ant project has another name, you can use the parameter -f:

ant -f /caminho/meu-build.xml -Dfile=file.txt

Learning more about the Ant

Just read the manual in full.

by ricidleiv • **2,267** points · Answer 2 · 2013-12-16T15:25:44+00:00

The command sed Linux allows working with labels, useful for working with recursiveness.

For example, it can be used as follows:

$ sed -e ':loop' -e 's/;;/;\\N;/g' -e 't loop' -e 's/;$/;\\N/' arquivo.csv > novo-arquivo.csv

Remembering that if the file was generated in Windows and is using the Linux command, you should convert the DOS standard file to Unix, because the end-of-line character is different. And vice versa.

You can use the commands dos2unix or unix2dos.