Split a worksheet into multiple worksheets

Asked

Viewed 1,752 times

1

I have a spreadsheet with 5000 lines that needs to be divided into sheets with 790 lines at most. How can I do this??

  • Are these spreadsheets pure data? If so, may I suggest converting to CSV and deal with bash/shell script in a reply.

  • 1

    These are spreadsheets with import data, nothing very out of standard. I wanted some solution in python if possible

  • Ok, I can provide the general idea and implementation in bash; I have an honest laziness to use Python right now, but the transformation should be very easy

  • 1

    Ok.... how would bash in?

  • I made the answer in bash =) I think now is the time to mark Gabriel’s answer as the right one, the answer that met your problem, since his is in Python, better suited to your problem

  • 1

    Man, thanks for the help and for the bash tmb, it will stay as an alternative to python for sure.

Show 1 more comment

2 answers

3

Try to use the function readlines for file handling:

file = open('entrada.txt', 'r')
lines = file.readlines()
nfile = 0 # Numero para organizar os arquivos a serem criados, sera incrementado no loop for.
inicio = 0 # Inicio do trecho a ser escrito
fim = 0 # Fim do trecho
nlines = 790 # numero de linhas por arquivo 
rangeloop = len(lines)//nlines # numero de iterações a serem executadas no loop for
if len(lines) % nlines > 0: # Caso a conta nao dê exata, faz uma iteração a mais para escrever o que sobrou
    rangeloop += 1
for x in range(rangeloop):
    nfile += 1 
    fim += nlines
    arq = open("arq"+str(nfile)+".txt", 'w')
    arq.write(''.join(lines[inicio:fim]))
    inicio = fim
    arq.close()

It worked here, try there and tell me, any doubt can ask, it is badly written because I did it very quickly.

  • I’m gonna test you, buddy!!!

  • A question: This script will do the splitting until all 5000 lines are distributed in multiple spreadsheets??

  • 1

    The result will be several . txt files with 790 lines in each, with the exception of the last which may be less than 790

  • 1

    I don’t know how your knowledge in Python is, but this is a well formed code, the correct would be you make some changes to allow the user to define the input file name, name of the output files, organize those variable statements, etc... If you don’t know much about Python, this is very simple, just google and you’ll find the answer without difficulty.

  • 1

    Opa..... I will test yes, qq thing I tell you. Thank you!!!

  • 1

    That’s why I didn’t want to answer in Python. I knew I had one import antigravity I just didn’t know which one

  • 1

    @gabrielbelini, I gave the syntax tip of its code for Python. For some reason, the highlght identified the entire division as a commentary, which confused me by reading his reply the first time

  • I’m getting the following error: Unicodeencodeerror: 'charmap' codec can’t Encode Character ' x8b' in position 15: Character maps to <Undefined>. I am opening the xlsx file so: file = open('Resultado_17_03_2017_BR2_02_2017_VINHO.xlsx', 'r', encoding='latin1'). I tried with utf-8 and the error persisted!!!

  • Try opening with Rb or Rb+ and keep encoding in utf-8

  • Your problem is now with file types, python by default reads .txt. files if the file type you want to open is . csv or . xlsx pe very likely q will need to import a reader library of this file type. From a look at the Openpyxl library and this link here teaches how to work with xlsx files, very simple as well. https://openpyxl.readthedocs.io/en/default/optimized.html

  • with Rb and Rb+ this occurs: Valueerror: Binary mode doesn’t take an encoding argument. I’ll check the library you mentioned!

  • It worked the fight against encoding?

  • It’s not about encoding - this code will only work for CSV, as commented above. XLSX files are zipped, and if you open the zip, the content of the lines is in a complex XML structure. If you cannot convert the spreadsheet to . csv for splitting, you’ll need to install a module in Python in order to read and write xlsx. I recommend Rows: https://pypi.python.org/pypi/rows

  • @Bene - By the way, if the solution didn’t work for you (I’m assuming not, because you’re still fighting with the "xlsx" thinking it’s just "encoding" - don’t mark the answer as "accept". This afternoon, I was on time, I saw this question, and I thought "well, you already have the answer accepted, nor will I look). If you like how you’re getting, it’s okay to give "upvote", but only accept when you solve the problem.

  • Oops, hasn’t really worked out the encoding issue yet. But I will turn into CSV and test then. Thanks

Show 10 more comments

3

Observing: this question aims at an answer in Python, but I believe this answer in bash can bring new air and, also, serves as curiosity

To make a proper textual treatment in bash, I am assuming that the spreadsheet has been exported to the format CSV, where each worksheet line ends with a \n in the CSV file.

The idea is to play a set of lines in separate files. Starting from line 0, the file will end on line 4999. Thus the lines 0-789 would be in the first file, 790-1579 would be in the second file and so on. To identify from which file the line n belong, just make the whole division and add 1: (n / 790) + 1.

So, come on. We’ll read from the archive planilhao.csv and play in the archives planilhinha-I.csv, where I is the file index. As stated earlier, the line n enters the file with index I = (n / 790) + 1. Given the general directions of the strategy to be followed, let’s build the script.

The first part of the strategy is to identify, for each line, its number. The following script does this reading/assignment number:

# o número da linha é um inteiro
declare -i line_no=0

while read LINHA; do
    # esse echo aqui é só para demonstrar a associação entre a linha lida e o line_no
    echo "linha $line_no, conteúdo ($LINHA)"

    # incremento o número da linha
    line_no+=1
done

To demonstrate the calculation of I, considering n_lines the maximum number of lines per file:

# o número da linha é um inteiro
declare -i line_no=0

# número máximo de linhas por arquivo
n_lines=790

while read LINHA; do
    # cálculo do índice do arquivo
    index=$(( (line_no / n_lines) + 1 ))

    # esse echo aqui é só para demonstrar a associação entre o line_no e o index
    echo "linha $line_no, arquivo índice $index, conteúdo ($LINHA)"

    # incremento o número da linha
    line_no+=1
done

So, knowing the index in which goes each line, we can play the line to the appropriate file:

# o número da linha é um inteiro
declare -i line_no=0

# número máximo de linhas por arquivo
n_lines=790

# prefixo do nome do arquivo
file_prefix=planilhinha

while read LINHA; do
    # cálculo do índice do arquivo
    index=$(( (line_no / n_lines) + 1 ))

    # nome do arquivo
    file_name="${file_prefix}-${index}.csv"

    # anexa a linha ao arqvuio 
    echo "$LINHA" >> "$file_name"

    # incremento o número da linha
    line_no+=1
done

So, that’s it. Put this script in a file (separador_planilha.sh for example) and send it the file you want to separate. The command would look like this:

./separador_planilha.sh < planilhao.csv

Browser other questions tagged

You are not signed in. Login or sign up in order to post.