Bulk encoding of files on linux

Asked

Viewed 1,009 times

2

Hello, I’m needing to do the encoding conversion of files to utf-8 since the application will migrate from server.

In this application I have files with several encodings actually, some in utf-8, others in iso-8859-1 and still files in windows-1252.

What I need to do is migrate everyone to utf8. I managed to find that bash that does this. The problem is that it adds a blank line after each line of the file, in case if I have a file with 200 line, it turns into 400 leaving the whole code messy and difficult to understand.

I wonder what I can do to avoid that blank line he adds.

That’s the code I have to make this conversion:


#!/bin/bash

DIRETORIO=$1
LISTA="/tmp/lista-conversor.txt"

if [ -d "${DIRETORIO}" ]; then

echo -e "\nGerando lista de arquivos em \"${DIRETORIO}\" a serem analisados:\n"
find ${DIRETORIO} -type f > ${LISTA}
#-exec $DIRETORIO {} \;

while read ARQUIVO;do

        ISO_YES=$(file $ARQUIVO|grep 8859|wc -l)
        if [ "$ISO_YES" -eq 1 ]; then
                echo "iso-8859 detectado em $ARQUIVO"
                iconv -f iso-8859-1 -t utf-8 ${ARQUIVO} >  ${ARQUIVO}.new && echo -e "Arquivo ${ARQUIVO} convertido com sucesso\n"
                cp $ARQUIVO ${ARQUIVO}.bkp
                mv ${ARQUIVO}.new ${ARQUIVO}
        fi

done < ${LISTA}

else
echo -e "Informe um diretorio\n\nEx:\n${0} <diretorio>"
fi
  • 1

    Add the parameter in your script -c after the command iconv and see if it works. : iconv -c -f iso-8859-1 -t utf-8 [...]

  • I will do it @gfleck, can you inform me what this command does?

  • 1

    Right, this command is what effectively converts your files. The rest of the script is to automate the process, but that line is the heart of your script.

  • @Joelpiccolidarosa, I can’t figure out why to double the number of lines. You can show the wc f before and after conversion? and already now the file -i f fo file concerned?

1 answer

1


How about:

#!/bin/bash

DIR=$1

if [ ! -d "${DIR}" ]; then
    echo -e "Informe um diretorio\nEx:\n${0} <diretorio>"
    exit 1;
fi

for ARQ in $( find ${DIR} -name '*.txt' -type f );
do
    CONVERT=$( file ${ARQ} | grep "8859" | wc -l )

    if [ "$CONVERT" -eq 1 ]; then

            echo -n "Processando: '${ARQ}' ... "

            iconv -f iso-8859-1 -t utf-8 ${ARQ} -o ${ARQ}.tmp

            if [ $? -eq 0 ]; then
                echo "OK!"
            else
                echo "ERRO!"
            fi

            cp ${ARQ} ${ARQ}.bkp

            mv ${ARQ}.tmp ${ARQ}
    fi

done
  • I ended up removing -e in iconv and then it stopped removing the blank lines after each line of the file. Thank you all for your help!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.