0
I am stuck looking for a precise and intuitive way to read a file of 70.000KB formed by the concatenation of several files being them with varied sizes. Initially possessing several files in the format '.txt' I converted each one of them using an algorithm I performed to eliminate the existence of values 0 and for each of the spaces found ('t') I separated the values of the dataset with comma, and soon after I converted all the CSV’s I concatenei all in a single file with pandas:
inn="C:\\Documents\\experimento"
out="C:\\Documents\\experimento\\full_dataset.csv"
os.chdir(inn)
FullCsv = gb.glob('*.csv')
dfList=list()
for simpleCsv in FullCsv:
print(simpleCsv)
df=pd.read_csv(simpleCsv,header=None)
dfList.append(df)
concatDf=pd.concat(dfList,axis=0)
concatDf.to_csv(out,index=None)
Soon after, I executed this newly created dataset being an attempt without pandas(commented excerpt):
import csv
import pandas as pd
with open("C:\\Documents\\experimento\\full_dataset.csv",'r') as foutput:
'''reader = csv.reader(foutput)
listaNova = list()
for r in reader:
listaNova.append(r)
print(listaNova)
'''
reader = pd.read_csv("C:\\Documents\\experimento\\full_dataset.csv", chunksize=100000)
for read in reader:
print(read)
But then I got:
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
And with the pandas gave this result:
0 1 2 3 4 5 6 \
0 0.17730 0.016505 0.058989 -0.314010 0.079795 0.293890 0.035616
1 -0.68875 -0.340940 -0.647040 0.108130 0.404710 -0.161510 -0.329860
2 1.27170 0.913990 1.389600 0.834080 0.347450 0.705510 0.547070
3 -0.53242 -0.566420 -0.558360 -0.813050 -0.365800 -0.352100 0.106440
4 0.17730 0.016505 0.058989 -0.314010 0.079795 0.293890 0.035616
.. ... ... ... ... ... ... ...
238 117.00000 -0.532420 -0.566420 -0.558360 -0.813050 -0.365800 -0.352100
239 118.00000 0.177300 0.016505 0.058989 -0.314010 0.079795 0.293890
240 119.00000 -0.688750 -0.340940 -0.647040 0.108130 0.404710 -0.161510
241 120.00000 1.271700 0.913990 1.389600 0.834080 0.347450 0.705510
242 121.00000 -0.532420 -0.566420 -0.558360 -0.813050 -0.365800 -0.352100
7 8 9 ... 46611 46612 46613 46614 46615 \
0 0.390770 0.35301 0.425470 ... NaN NaN NaN NaN NaN
1 0.125460 -0.13454 -0.061552 ... NaN NaN NaN NaN NaN
2 0.357910 0.85464 0.346880 ... NaN NaN NaN NaN NaN
3 -0.545210 -0.64630 -0.519490 ... NaN NaN NaN NaN NaN
4 0.390770 0.35301 0.425470 ... NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ... ... ...
238 0.106440 -0.54521 -0.646300 ... NaN NaN NaN NaN NaN
239 0.035616 0.39077 0.353010 ... NaN NaN NaN NaN NaN
240 -0.329860 0.12546 -0.134540 ... NaN NaN NaN NaN NaN
241 0.547070 0.35791 0.854640 ... NaN NaN NaN NaN NaN
242 0.106440 -0.54521 -0.646300 ... NaN NaN NaN NaN NaN
46616 46617 46618 46619 46620
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
.. ... ... ... ... ...
238 NaN NaN NaN NaN NaN
239 NaN NaN NaN NaN NaN
240 NaN NaN NaN NaN NaN
241 NaN NaN NaN NaN NaN
242 NaN NaN NaN NaN NaN
[243 rows x 46621 columns]
I wonder if there is any way to visualize the entire dataset without them all being summarized and in your opinion what is the best method for concatenation and reading dataset, if in the case without the pandas would be better? My intention is to look for a way to work with this dataset comparing to the pre-converted values and read it by dividing it by rows x columns and standardizing the number of columns in all rows. Note: I am beginner in this area of data science.
You really need all this?
– Tmilitino