CSV files are always read line by line. However, unless they are really too large, the data fits all in memory (if it doesn’t fit, a specialized system is required - even the Pandas library depends on putting all the data in memory).
In particular, I didn’t understand how having a new file could help in this case- in the example given, I would just be renaming the columns, but still have a column for "name" - only it would have the title "name" instead of "A". So let’s focus on having the columns in memory, and you can work from there (reading a file of this type has negligible time - so it’s okay to read the data every time you run the program)
As a rule, in modern Python, the isolation of a table in columns that will be treated separately is more practical with the same Pandas, which has this practically ready. As you explicitly mentioned that you do not want to use Pandas or numpy, the simple way is to read all the data as CSV file, and then, having in hand a "list list", where each element is a list, make a transposition of this data. A practical way of transposing is with the function zip
- but let’s go in parts, to be understandable.
Instead of using the zip
to simply transpose the data, which can be done in a row, I will write some lines of code that will:
- create a dictionary that will be your final data structure. Each key in the dictionary will be the column title, and value will be a list of the data in that column. For this, the code will use the first line of the CSV file.
- Scroll through the data line with a
for
,and then use the function zip
to match the data of each column with the corresponding list in the created dictionary.
The zip function does just this: given two or more interacting objects, it takes an element of each of these objects as a result of each iteration. Like the for
Python allows you to put more than one variable, it works very well - in practice the for
Python using ZIP can simultaneously traverse the sequence of lists in the data structure we create at the same time and the data of that row. We add the data in the list, and move to the next column. At the end of the row, the for
is repeated, picking up the same lists in the data dictionary, but the data in the next column:
Before going straight to the CSV file, to get more didactic, follow an example of this in interactive mode:
In [31]: tabela = [[1, 2, 3], [4, 5, 6]]
In [32]: dados = {"a": [], "b": [], "c": []}
In [33]: for linha in tabela:
...: for coluna_dados, valor in zip(dados.items(), linha):
...: print(coluna_dados, valor)
...: coluna_dados[1].append(valor)
...:
('a', []) 1
('b', []) 2
('c', []) 3
('a', [1]) 4
('b', [2]) 5
('c', [3]) 6
In [34]: print (dados)
{'a': [1, 4], 'b': [2, 5], 'c': [3, 6]}
And the code to do the same thing, but with the data from the CSV file:
from collections import OrderedDict
import csv
with open('data.csv') as stream:
reader = csv.reader(stream)
data = OrderedDict((column_name, []) for column_name in next(reader))
for row in reader:
for column, value in zip(data.values(), row):
column.append(value)
At that point in the code the variable data
is the dictionary described above: in which each column of the original CSV file has a key with its title, and all values in a list.
I used the OrderedDict
above to ensure that the code works in any version of Python - but from Python 3.7, normal dictionaries preserve the order - so one can use a dict
normal instead of Ordereddict in that code. (In older versions, a normal Dict would not guarantee column order)
Pandas
In projects that have no restrictions on the use of Pandas, the native "Dataframe" structure of Pandas already provides access by columns naturally - the Dataframe also works as a map, where the title of each column is a series with its data:
import pandas as pd
data = pd.read_csv("meuarquivo.csv")
print(data["A"])
Thank you very much!!! ^_^
– Roseli Gomes