Generating file with 128 bytes
with random data for testing, which represents 32
integers of 4 bytes
:
$ head -c 128 < /dev/urandom > randomnumbers.bin
Generated file:
$ xxd randomnumbers.bin
0000000: 300c 54ea 4023 8592 267c 0dc9 f961 0a6d 0.T.@#..&|...a.m
0000010: d0d6 cef3 950e 39ac 8422 5671 c1a2 2546 ......9.."Vq..%F
0000020: ea5a b0e5 cb00 9fb5 40e5 cb7b 849e fb36 .Z......@..{...6
0000030: d64e 77f8 0351 866c 4f2c 824b c98b 82a5 .Nw..Q.lO,.K....
0000040: 7421 e0d1 626a 2cdd 090e 69a4 0894 01bf t!..bj,...i.....
0000050: 37a0 0405 cdbc 57f2 fa4f 1e78 89c1 f2b5 7.....W..O.x....
0000060: c8eb 2c63 4c13 2e47 d59b 234d b951 41df ..,cL..G..#M.QA.
0000070: 1d65 52a7 51c9 240e 2426 4f55 a6c9 2cfb .eR.Q.$.$&OU..,.
Solution #1: using the module struct:
import struct
lista=[]
with open( "randomnumbers.bin", "rb") as arq:
for num in iter( lambda: arq.read(4), b'' ):
lista.append( struct.unpack( 'i', num )[0] )
print(lista)
Exit:
[-363590608, -1836768448, -921863130, 1829396985, -204548400, -1405546859, 1901470340, 1176871617, -441427222, -1247870773, 2076960064, 922459780, -126398762, 1820741891, 1266822223, -1518171191, -773840524, -584291742, -1536618999, -1090415608, 84189239, -229131059, 2015252474, -1242381943, 1663888328, 1194201932, 1294179285, -549367367, -1487772387, 237291857, 1431250468, -80950874]
Solution #2: Using Numpy Arrays:
import numpy as np
with open( "randomnumbers.bin", "rb") as arq:
lista = np.fromfile( arq, dtype=np.int32 ).tolist()
print(lista)
Exit:
[-363590608, -1836768448, -921863130, 1829396985, -204548400, -1405546859, 1901470340, 1176871617, -441427222, -1247870773, 2076960064, 922459780, -126398762, 1820741891, 1266822223, -1518171191, -773840524, -584291742, -1536618999, -1090415608, 84189239, -229131059, 2015252474, -1242381943, 1663888328, 1194201932, 1294179285, -549367367, -1487772387, 237291857, 1431250468, -80950874]
Solution #3: Using the method .from_bytes()
(Python 3 only)
lista=[]
with open( "randomnumbers.bin", "rb") as arq:
for num in iter( lambda: arq.read(4), b'' ):
lista.append(int.from_bytes(num, byteorder='little', signed=True))
print(lista)
Exit:
[-363590608, -1836768448, -921863130, 1829396985, -204548400, -1405546859, 1901470340, 1176871617, -441427222, -1247870773, 2076960064, 922459780, -126398762, 1820741891, 1266822223, -1518171191, -773840524, -584291742, -1536618999, -1090415608, 84189239, -229131059, 2015252474, -1242381943, 1663888328, 1194201932, 1294179285, -549367367, -1487772387, 237291857, 1431250468, -80950874]
Analise Comparative (Python 3):
The utilitarian time
can be used to compare the efficiency of the two solutions presented when processing large files.
Generating file with random data from 128MB
bytes, which represents 33.554.432
integers of 4 bytes
:
$ head -c 128M < /dev/urandom > randomnumbers.bin
test py.:
import sys
import struct
import numpy as np
def solucao_struct():
lista=[]
with open( "randomnumbers.bin", "rb") as arq:
for num in iter( lambda: arq.read(4), b'' ):
lista.append( struct.unpack( 'i', num )[0] )
def solucao_numpy():
with open( "randomnumbers.bin", "rb") as arq:
lista = np.fromfile( arq, dtype=np.int32 ).tolist()
def solucao_from_bytes():
lista=[]
with open( "randomnumbers.bin", "rb") as arq:
for num in iter( lambda: arq.read(4), b'' ):
lista.append(int.from_bytes(num, byteorder='little', signed=True))
if( sys.argv[1] == "--np" ):
solucao_numpy()
elif( sys.argv[1] == "--struct" ):
solucao_struct()
elif( sys.argv[1] == "--frombytes" ):
solucao_from_bytes()
Measuring Solution Performance with Numpy Arrays:
$ time python3 teste.py --np
real 0m3.766s
user 0m2.384s
sys 0m1.362s
Integers Per Second (Numpy):
(128MB / 4Bytes) / 3.766s = 8909833.2
Measuring Solution Performance with Struct:
$ time python3 teste.py --struct
real 0m38.200s
user 0m36.700s
sys 0m1.411s
Integers Per Second (Struct):
(128MB / 4Bytes) / 38.2s = 878388.2
Measuring Solution Performance with int.from_bytes()
:
$ time python3 teste.py --frombytes
real 2m2.691s
user 2m1.057s
sys 0m1.375s
Integers Per Second (int.from_bytes()
):
(128MB / 4Bytes) / 62.691s = 535235.2
Can you put a snippet of the file that has the numbers? Here did not open the link you passed. And why says that strange numbers came out?
– Woss