Command to replace characters recursively

Asked

Viewed 2,176 times

5

I need a command that overwrites a specific pattern on each line of a file as many times as necessary until the pattern is no longer found.

For example, in a CSV file, fields are separated by a semicolon ;.
Null fields have no character, as in the following file representing a contact list with 3 records:

Nome;Sobrenome;Telefone1;Telefone2;Email
Joao;Silva;9999-8888;9292-9292;[email protected]
Maria;Souza;8899-0011;;[email protected]
Carlos;Oliveira;;;

The first line is the file header. The contact Maria Souza owns the Telefone2 null and void contact Carlos Oliveira has null the fields Telefone1, Telefone2 and Email.

I want to add \N where the field is null.

On Linux, I use the command:

$ sed -e 's/;;/;\\N;/g' -e 's/;$/;\\N/' arquivo.csv > novo-arquivo.csv

The result is satisfactory for the record Maria Souza, but not for the Carlos Oliveira, because by finding the first pattern ;; and performing the substitution (Carlos;Oliveira;\N;;) it does not consider the substitute text in the continuation of the research and already passes to the next standard, which is the ;$, leaving the result this way:

Carlos;Oliveira;\N;;\N

Remaining a null field yet.
I would like a solution for both Unix and Windows.

  • I don’t think it’s a good idea to treat a CSV file with a regular expression, but, as you probably already know this and should simply be processing data to provide to another program, I’ll let it go ;)

3 answers

4


Use perl, which supports look-Ahead:

 perl -p -e 's/;(?=;|$)/;\\N/g' arquivo.csv > novo-arquivo.csv

Incidentally, if you want to make the change within the same file (without having to redirect to another), simply pass the -i option (infidel):

 perl -p -i -e 's/;(?=;|$)/;\\N/g' arquivo.csv
  • Very interesting @Lias. But this command you posted does not meet the last field of the record if the line ends with ; which means that the last field is null and shall be replaced by ;\N. I can put a new parameter -e, thus?: perl -p -e 's/;(?=;)/;\\N/g' -e 's/;$/;\\N/g' arquivo.csv > novo-arquivo.csv

  • @ricidleiv truth, just fix, thanks.

  • and in the case, for example, to delete a word recursively in a line, like the word nulo in string nnuloulochavenulnuloo to leave only the plavra chave. In Perl, how could I do that?

  • Well, in that case I think there would be no way to escape from a loop. Likely the one-Liner looks more beautiful in sed, same: sed ':loop; s/nulo//g; t loop;' -- hard to beat that. =)

  • 1

    You can simplify the case of the line ending with ; use perl -p -e 's/;(?=;|$)/;\\N/g;'

  • @rodrigorgs good, Rodrigo! I’ll incorporate the suggestion! -- although it gets a little harder to read, right?

Show 1 more comment

1

I’m used to Java development environments, both on Linux and Windows, so I would use a task Ant to perform cross-platform file handling operations like this.

Ant is a powerful and versatile tool used for automation, builds (compilation and package assembly) and file processing. It is important to note that Ant is not a programming language, as some think, but it is a form of declaration of activities (tasks) to be implemented.

Installing the Ant

Download the binary package here, unpack it into a folder and add it to the PATH of your operating system.

Example in Windows:

set path=%path%;c:\caminho\apache-ant-1.9.3\bin

Writing the Ant Build

The following Ant project replaces lines in a given file:

<project name="MyProject" default="replace" basedir=".">
    <target name="replace">
        <replaceregexp
                file="${file}"
                byline="true"
                match=";(?=;|$)"
                replace=";\\\\N"
                flags="gs" />
    </target>
</project>

Running the Project

Ant automatically searches for a file called build.xml in the current directory. So, if file.txt is the file to be processed, the following command will perform the overwriting:

ant -Dfile=file.txt

If the Ant project has another name, you can use the parameter -f:

ant -f /caminho/meu-build.xml -Dfile=file.txt

Learning more about the Ant

Just read the manual in full.

1

The command sed Linux allows working with labels, useful for working with recursiveness.

For example, it can be used as follows:

$ sed -e ':loop' -e 's/;;/;\\N;/g' -e 't loop' -e 's/;$/;\\N/' arquivo.csv > novo-arquivo.csv

Remembering that if the file was generated in Windows and is using the Linux command, you should convert the DOS standard file to Unix, because the end-of-line character is different. And vice versa.

You can use the commands dos2unix or unix2dos.

  • 1

    By the way, sed also supports separating commands with ; and directly change the file itself so you could do: sed -i ':loop; s/;;/;\\N;/g; t loop; s/;$/;\\N/' arquivo.csv -- result equivalent to the second perl command of my reply =)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.