Python binary file saving error

Asked

Viewed 512 times

1

I have a program in Python that receives a binary file via parameter and saves this file. However, when it saves the file, some characters it replaces with a series of numbers. Below the original file I receive as parameter:

ÐT_Ö/ ːæ ®kMµûSoz"Ô(Î,"+œd¼Es

But when the program records, look at the result:

ÐT_Ö/ ːæ ®kMµûSoz & #148 ; Ô(Î, & #147 ; & #156 ;d¼Es

You can see that character " between the characters z and Ô has been replaced by the sequence of & # 148;. Also the character + between characters Î and andd has been replaced by the & # 147

Below the code of the program Python that do the milking and recording binary file

import subprocess
from subprocess import Popen, PIPE, STDOUT
def chamaProg(arquivo): 
   var_file = open("C:\\Nitgen\\arquivo.rec","wb")
   conteudo_texto = var_file.write(arquivo)
   var_file.close(

Why is this happening?

What should I do to read and write all characters correctly?

Please, I need to resolve this problem urgently.

Thank you.

3 answers

1

Except for one ) who was absent after the close( (which I assume was an error when copying and pasting), your code is right. Where does the variable come from arquivo? As you mentioned error 500 in another answer, I imagine this is part of a web application? I would investigate (even if spreading prints by code) where this variable comes from; it is being prepared to be displayed on the web, not as a binary string.


If you have no way to avoid this transformation (because you do not control the code that calls your function), you can try the "dirty" solution of interpreting the input as an HTML fragment:

import HTMLParser
html_parser = HTMLParser.HTMLParser()
arquivo = html_parser.unescape(arquivo)

(but note that you should use this only to put out a fire in production; you have to figure out why arquivo is coming with these replacements)

0

Are you having problems with the encoding of these characters, I recommend you open the file as utf8. You can achieve this with the built-in package codecs. Take an example:

# -*- coding: utf-8 -*-
import codecs


def save_file(content):
    with codecs.open('file.rec', 'wb', 'utf8') as f:
        f.write(content)

if __name__ == '__main__':
    save_file(u'ÐT_Ö/¤Ð樮kMµûÀz”Ô(Î,“+œd¼Es¥')

And another tip, whenever working with files, use the context manager (the with this example), because it already closes the file for you when you leave the context.

  • 2

    If the recording is binary, it should not - nor does it make sense - be used an encoding.

0


Binary files have characters ASCII the file you are trying to record has characters Unicode you have convert before.

def chamaProg(arquivo): 
  var_file = open("C:\\Nitgen\\arquivo.rec","wb")
  conteudo_texto = var_file.write(arquivo.encode("utf-8"))
  var_file.close()
  • It did not work. Error 500 when calling the program

  • 1

    If the answer did not work (and is in fact wrong) - why was it accepted? In time - like @drgarcia’s reply: you don’t address the error that actually happens here: a recording of binary data received as binary data does not depend on the text codec.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.