Concatenate . html files into a . htm with Powershell

Asked

Viewed 171 times

0

So, I need to concatenate thousands of . html files to just one . htm, there’s already a file. batch that does this for me, using the following code:

#type *.html > output.htm

@echo off

for /r %%i in (*.html) do (

    if not %%~nxi == output.htm (

        echo %%~nxi >> output.htm
        type "%%i" >> output.htm
        echo. >> output.htm
    )
)

What happens is that this process is extremely slow, I found that with Powershell is faster and I tried to create the code, but I know very little of languages, it was like this:

get-content *.html | Set-Content output.htm

Beeeeem simple! Even worked out, the problem is that some characters get "corrupted", as in the text below:

O ICMS de responsabilidade das empresas industriais fabricantes de calçados 
que usufruam do crédito presumido previsto no inciso XXIX do artigo 57 do 
RICMS/SE, é diferido no recebimento do exterior ou, relativamente à 
diferença de alíquotas, pelas aquisições em outra unidade federada de 
máquinas, equipamentos, ferramental, moldes, modelos, instrumentos e 
aparelhos industriais e de controle de qualidade, e seus sobressalentes.

I tried to leave the code like this:

get-content *.html | Set-Content output.htm -Encoding UTC8

and tried to change the code to others and tbm didn’t work.

Can someone help out?

1 answer

0

Possibly there may be multiplicities of encoding in the content of your html files, I suggest testing using -Raw and replace -Encoding UTC8 for String:


Get-Content -Raw *.html | Set-Content output.htm -Encoding String

or still using alias


cat -Raw *.html | sc output.htm -Encoding String

Browser other questions tagged

You are not signed in. Login or sign up in order to post.