Split a worksheet into multiple worksheets

Question

Split a worksheet into multiple worksheets

Asked 8 years, 5 months ago

Viewed 1,752 times

1

I have a spreadsheet with 5000 lines that needs to be divided into sheets with 790 lines at most. How can I do this??

Are these spreadsheets pure data? If so, may I suggest converting to CSV and deal with bash/shell script in a reply.

– Jefferson Quesado

2017/05/26 at 02:31
1

These are spreadsheets with import data, nothing very out of standard. I wanted some solution in python if possible

– Bene

2017/05/26 at 02:37
Ok, I can provide the general idea and implementation in bash; I have an honest laziness to use Python right now, but the transformation should be very easy

– Jefferson Quesado

2017/05/26 at 02:40
1

Ok.... how would bash in?

– Bene

2017/05/26 at 02:41
I made the answer in bash =) I think now is the time to mark Gabriel’s answer as the right one, the answer that met your problem, since his is in Python, better suited to your problem

– Jefferson Quesado

2017/05/26 at 03:58
1

Man, thanks for the help and for the bash tmb, it will stay as an alternative to python for sure.

– Bene

2017/05/26 at 10:07

Show 1 more comment

2 answers

Browser other questions tagged python excel vba pandas excel-vba

You are not signed in. Login or sign up in order to post.

by Gabriel Belini • **545** points · Answer 1 · 2017-05-26T03:26:00+00:00

3

Try to use the function readlines for file handling:

file = open('entrada.txt', 'r')
lines = file.readlines()
nfile = 0 # Numero para organizar os arquivos a serem criados, sera incrementado no loop for.
inicio = 0 # Inicio do trecho a ser escrito
fim = 0 # Fim do trecho
nlines = 790 # numero de linhas por arquivo 
rangeloop = len(lines)//nlines # numero de iterações a serem executadas no loop for
if len(lines) % nlines > 0: # Caso a conta nao dê exata, faz uma iteração a mais para escrever o que sobrou
    rangeloop += 1
for x in range(rangeloop):
    nfile += 1 
    fim += nlines
    arq = open("arq"+str(nfile)+".txt", 'w')
    arq.write(''.join(lines[inicio:fim]))
    inicio = fim
    arq.close()

It worked here, try there and tell me, any doubt can ask, it is badly written because I did it very quickly.

I’m gonna test you, buddy!!!

– Bene

2017/05/26 at 03:32
A question: This script will do the splitting until all 5000 lines are distributed in multiple spreadsheets??

– Bene

2017/05/26 at 03:34
1

The result will be several . txt files with 790 lines in each, with the exception of the last which may be less than 790

– Gabriel Belini

2017/05/26 at 03:37
1

I don’t know how your knowledge in Python is, but this is a well formed code, the correct would be you make some changes to allow the user to define the input file name, name of the output files, organize those variable statements, etc... If you don’t know much about Python, this is very simple, just google and you’ll find the answer without difficulty.

– Gabriel Belini

2017/05/26 at 03:41
1

Opa..... I will test yes, qq thing I tell you. Thank you!!!

– Bene

2017/05/26 at 03:41
1

That’s why I didn’t want to answer in Python. I knew I had one import antigravity I just didn’t know which one

– Jefferson Quesado

2017/05/26 at 03:50
1

@gabrielbelini, I gave the syntax tip of its code for Python. For some reason, the highlght identified the entire division as a commentary, which confused me by reading his reply the first time

– Jefferson Quesado

2017/05/26 at 03:56
I’m getting the following error: Unicodeencodeerror: 'charmap' codec can’t Encode Character ' x8b' in position 15: Character maps to <Undefined>. I am opening the xlsx file so: file = open('Resultado_17_03_2017_BR2_02_2017_VINHO.xlsx', 'r', encoding='latin1'). I tried with utf-8 and the error persisted!!!

– Bene

2017/05/26 at 12:34
Try opening with Rb or Rb+ and keep encoding in utf-8

– Gabriel Belini

2017/05/26 at 13:38
Your problem is now with file types, python by default reads .txt. files if the file type you want to open is . csv or . xlsx pe very likely q will need to import a reader library of this file type. From a look at the Openpyxl library and this link here teaches how to work with xlsx files, very simple as well. https://openpyxl.readthedocs.io/en/default/optimized.html

– Gabriel Belini

2017/05/26 at 13:42
with Rb and Rb+ this occurs: Valueerror: Binary mode doesn’t take an encoding argument. I’ll check the library you mentioned!

– Bene

2017/05/26 at 14:31
It worked the fight against encoding?

– Jefferson Quesado

2017/05/27 at 01:54
It’s not about encoding - this code will only work for CSV, as commented above. XLSX files are zipped, and if you open the zip, the content of the lines is in a complex XML structure. If you cannot convert the spreadsheet to . csv for splitting, you’ll need to install a module in Python in order to read and write xlsx. I recommend Rows: https://pypi.python.org/pypi/rows

– jsbueno

2017/05/28 at 00:34
@Bene - By the way, if the solution didn’t work for you (I’m assuming not, because you’re still fighting with the "xlsx" thinking it’s just "encoding" - don’t mark the answer as "accept". This afternoon, I was on time, I saw this question, and I thought "well, you already have the answer accepted, nor will I look). If you like how you’re getting, it’s okay to give "upvote", but only accept when you solve the problem.

– jsbueno

2017/05/28 at 00:37
Oops, hasn’t really worked out the encoding issue yet. But I will turn into CSV and test then. Thanks

– Bene

2017/05/28 at 02:53

Show 10 more comments

by Jefferson Quesado • **22,370** points · Answer 2 · 2017-05-26T03:47:38+00:00

Observing: this question aims at an answer in Python, but I believe this answer in bash can bring new air and, also, serves as curiosity

To make a proper textual treatment in bash, I am assuming that the spreadsheet has been exported to the format CSV, where each worksheet line ends with a \n in the CSV file.

The idea is to play a set of lines in separate files. Starting from line 0, the file will end on line 4999. Thus the lines 0-789 would be in the first file, 790-1579 would be in the second file and so on. To identify from which file the line n belong, just make the whole division and add 1: (n / 790) + 1.

So, come on. We’ll read from the archive planilhao.csv and play in the archives planilhinha-I.csv, where I is the file index. As stated earlier, the line n enters the file with index I = (n / 790) + 1. Given the general directions of the strategy to be followed, let’s build the script.

The first part of the strategy is to identify, for each line, its number. The following script does this reading/assignment number:

# o número da linha é um inteiro
declare -i line_no=0

while read LINHA; do
    # esse echo aqui é só para demonstrar a associação entre a linha lida e o line_no
    echo "linha $line_no, conteúdo ($LINHA)"

    # incremento o número da linha
    line_no+=1
done

To demonstrate the calculation of I, considering n_lines the maximum number of lines per file:

# o número da linha é um inteiro
declare -i line_no=0

# número máximo de linhas por arquivo
n_lines=790

while read LINHA; do
    # cálculo do índice do arquivo
    index=$(( (line_no / n_lines) + 1 ))

    # esse echo aqui é só para demonstrar a associação entre o line_no e o index
    echo "linha $line_no, arquivo índice $index, conteúdo ($LINHA)"

    # incremento o número da linha
    line_no+=1
done

So, knowing the index in which goes each line, we can play the line to the appropriate file:

# o número da linha é um inteiro
declare -i line_no=0

# número máximo de linhas por arquivo
n_lines=790

# prefixo do nome do arquivo
file_prefix=planilhinha

while read LINHA; do
    # cálculo do índice do arquivo
    index=$(( (line_no / n_lines) + 1 ))

    # nome do arquivo
    file_name="${file_prefix}-${index}.csv"

    # anexa a linha ao arqvuio 
    echo "$LINHA" >> "$file_name"

    # incremento o número da linha
    line_no+=1
done

So, that’s it. Put this script in a file (separador_planilha.sh for example) and send it the file you want to separate. The command would look like this:

./separador_planilha.sh < planilhao.csv