Algorithm for distributing entries from an encyclopedia based on number of characters per day

Asked

Viewed 414 times

12

I managed a list of entries from an encyclopedia and I want to find the best way to separate the entries so that I choose a number of days and I have a schedule of which entries to read per day.

(The list is here: https://pastebin.com/4qNLqCJj)

The first item is the volume, the second entry, the third character number and the fourth is an id (generated from the date the entry was recorded on my pc).

I would like to make a script that generates a schedule, I would enter the number of days and it would divide the total number of characters of the encyclopedia and separate the entries from the number of characters of the average.

Example: I’ll enter the number of days, 120. The script will generate a script so that I can read all the characters of the three volumes of the encyclopedia in 120 days. For example:

The program should have 120 days, and each day will have an X number of entries having their number of characters summed equaling a number as close as possible to the average of characters per day.

Entries can be read in any order and in any volume, what matters is that they are distributed so that an almost equal amount of jails is read each day.


I asked an earlier question related to this and a user suggested the prediction error algorithm, but I’m not able to adapt it to this script.

Behold: What is the algorithm for distributing paragraphs?

How could I do that?

  • If I understand correctly, what you want to do involves these steps: 1) Calculate the average of characters per day needed for the entire encyclopedia to be read in 120 days; 2) Group entries per day, so that each day has entries that make up approximately the same number of characters as the average. If this understanding is correct, you probably know how to do (1) but are having trouble doing it (2).

  • 2

    A "simple" solution is to sort the entries from the smallest to the largest (that is, fewer characters for more characters) and allocate them on days until they exceed the calculated average in (1). Of course there may be entries that alone already exceed the average or that cannot be combined with any other without surpassing it, so in that case you have to use some other heuristic to make the best division. Better division algorithms are more complex and relate to the famous Backpack problem. Maybe this reading will help you.

3 answers

2


I haven’t read the other answers, and I don’t know any "smart" way to resolve your question. Having said that, I found the problem interesting, and I tried to solve "in my own way" (under the nail).

I imagine that you don’t want/can break an entry in half - if not you could make a division "perfect" and would not have opened a question.

I say this because I realized that you have entries with 70 characters - and others with 7000, which makes it difficult (I believe) a good distribution.

What the code I wrote intends to do is this::

  • Fill in the desired number of days with the largest entries found
  • Search for days whose character sum is farthest from average
  • Add new entries to these days
  • And this is it. In my tests, the result seems reasonable, but I confess that I took so much of the programming itself that I could not test very much the possible solutions!

    The program expects a text file, with identical formatting to the one you posted on Pastebin - information on each entry separated by semicolon, and entries separated from each other by a new line:

    vol1;Ageu, Livro de;4629;1494553563.48
    vol1;Agricultura;6593;1494553566.18
    vol1;Água;8947;1494553571.57
    vol1;Águia;10794;1494553577.0
    vol1;Aguilhada;1688;1494553582.39
    vol1;Agulha;580;1494553585.11
    vol1;Agulha, Orifício da;1418;1494553587.82
    

    Upshot: screenshot do programa rodando

    "use strict";
    function mostraDados() {
    	if (window.carregado) {
    		document.getElementById("qtdcaracteres").value = window.qtdcaracteres;
    		document.getElementById("qtdverbetes").value = window.qtdverbetes;
    		if (document.getElementById("qtddias").value > window.qtdverbetes) {
    			document.getElementById("qtddias").value = window.qtdverbetes;
    		}	
    		var qtddias = document.getElementById("qtddias").value;
    		document.getElementById("qtdmedia").value = Math.round(window.qtdcaracteres/qtddias);
    		window.qtdmedia = Math.round(window.qtdcaracteres/qtddias);
    	}
    }
    
    function saida(div, texto) {
    	div.innerHTML = div.innerHTML + texto;	
    }
    
    function limpa(div) {
    	div.innerHTML = "";	
    }
    function criaVerbetes(meuArray) {
    		var obj = new Array();
    		for (var i = 0; i < meuArray.length; i++) {
    			obj[i] = {
    				vol: (meuArray[i].split(";"))[0],
    				titulo: (meuArray[i].split(";"))[1],
    				caracteres: (meuArray[i].split(";"))[2],
    				id: (meuArray[i].split(";"))[3],
    				pegaCaracteres: function() { return parseInt(this.caracteres); },
    			}
    		}
    	return obj;
    }
    
    function contaCaracteres(meuArray) {
    		var qtdCaracteres = 0;
    		for (var i = 0; i < meuArray.length; i++) {
    			qtdCaracteres += meuArray[i].pegaCaracteres();
    		}
    	return qtdCaracteres;
    }
    
    function ordena(meuArray) {
    		meuArray.sort(function(a, b) {
    			return a.pegaCaracteres() - b.pegaCaracteres();
    		});
    }
    
    function ordenaId(meuArray) {
    		meuArray.sort(function(a, b) {
    			return a.id - b.id;
    		});
    }
    
    function listaVerbetes(meuArray) {
    	for (var i = 0; i < meuArray.length; i++) {
    		console.log(meuArray[i].titulo + ": " + meuArray[i].pegaCaracteres());
    	}
    }
    
    function criaArrayDeDias() {
    	var dias = document.getElementById("qtddias").value;
    	var meuArray = new Array();
    	
    	for (var i = 0; i < dias; i++) {
    		meuArray[i] = {
    			verbete: [],
    			pegaSoma: function() {
    				var ct=0;
    				for (var i = 0; i < this.verbete.length; i++) {
    					ct += this.verbete[i].pegaCaracteres();
    				}
    				return ct;
    			}
    		}
    	}
    
    	return meuArray;
    }
    
    function calcula() {
    	if (window.carregado) {
    		window.objDia = criaArrayDeDias();
    		window.colecaoVerbetes = criaVerbetes(arrayVerbetes);
        
                    ordena(colecaoVerbetes);
    
    		for (var i = 0; i < window.objDia.length; i++) {
    			window.objDia[i].verbete[window.objDia[i].verbete.length] = window.colecaoVerbetes.pop();
    		}
    		
    		var longeDaMedia = 0;
    		var index;
    		while(window.colecaoVerbetes.length > 0) {
    			for (var i = 0; i < window.objDia.length; i++) {
    			var provisorio = window.qtdmedia - window.objDia[i].pegaSoma();
    				if (longeDaMedia < provisorio) {
    					longeDaMedia = provisorio;
    					index = i;
    				}			
    			}
    			window.objDia[index].
    				verbete[window.objDia[index].
    					verbete.length] = window.colecaoVerbetes.pop();
    			longeDaMedia = 0;
    		}
    
    		// saiu do while! hora de morfar
    		var div = document.getElementById('saida');
    		var div2 = document.getElementById('saida2');
    		for (var i = 0; i < window.objDia.length; i++) {
    				ordenaId(window.objDia[i].verbete);				
    		}
    
    		limpa(div);
    		for (var z = 0; z < window.objDia.length; z++) {
    			saida(div, "Dia " + z + ":<br>");
    			saida(div, "soma de caracteres: " + window.objDia[z].pegaSoma() + "<br>");
    			saida(div, "numero de verbetes: " + window.objDia[z].verbete.length + "<br><br>");
    		}
    
    
    		var div = document.getElementById('saida');
    		limpa(div2);
    				for (var z = 0; z < window.objDia.length; z++) {
    			saida(div2, "Dia " + z + ":<br>");
    			for (var g = 0; g < window.objDia[z].verbete.length; g++) {
    				saida(div2, window.objDia[z].verbete[g].vol + ", " +
    					window.objDia[z].verbete[g].titulo + "<br>");
    			}
    			saida(div2, "<br>");
    		}
    
    	} else { console.log("nao carregado"); }
    }
    
    function leArquivo(e) {
    	var arquivo = e.target.files[0];
    	var leitor = new FileReader();
    	leitor.onload = function(e) {
    		window.arquivoTexto = e.target.result;
    		window.arrayVerbetes = arquivoTexto.split("\n");
    		window.carregado = true;
    		window.objDia = criaArrayDeDias();
    		window.objVerbetes = criaVerbetes(arrayVerbetes);
    		window.colecaoVerbetes = window.objVerbetes;
    		window.qtdverbetes = window.objVerbetes.length;
    		window.qtdcaracteres = contaCaracteres(objVerbetes);
    		mostraDados();
    	};
    leitor.readAsText(arquivo);
    }
    
    document.getElementById('arquivo').addEventListener('change', leArquivo, false);
    document.getElementById('qtddias').addEventListener('change', mostraDados, false);
    label {display:block;}
    textarea {display:block;}
    
    .container{
        font-family: Tahoma, Verdana, Segoe, sans-serif;
        padding: 10px;
        background-color:#d6d6c2;
        display:flex;
    }
    .container2{
        color: #000;
        height:400px;
        overflow-y:scroll;
    }
    .divide{
        padding: 10px;
        flex-grow: 1;
    }
    <!DOCTYPE html>
    <html>
    <body> 
    <head>
    <meta charset="utf-8"/>
    </head>
    <body>
    <div class="container">
    	<div class="divide">
    			<label for="title">Quantidade de verbetes</label>
    			<textarea id="qtdverbetes" rows="2" cols="15" readonly></textarea>
    			<label for="title">Quantidade de caracteres</label>
    			<textarea id="qtdcaracteres" rows="2" cols="15" readonly></textarea>
    			<label for="title">Media caracteres/dia</label>
    			<textarea id="qtdmedia" rows="2" cols="15" readonly></textarea>
    		</div>
    		<div class="divide">
    			<input type="file" id="arquivo"/><hr>
    			<label for="title">Numero de dias</label>
    			<textarea id="qtddias" rows="2" cols="15"></textarea>
    			<hr>
    			<button onclick="calcula()">Calcular</button>
    		</div>
    		</div>
    	<div class="container">
    		<div class="divide">
    			<p>Contagem de caracteres:</p>
    			<div class="container2" id="saida"></div>
    		</div>
    		<div class="divide">
    			<p>Verbetes:</p>
    			<div class="container2" id="saida2"></div>
    		</div>
    <script src=".\verbetes.js">
    </script>
    </body>
    </html>

    0

    Solving this problem is not difficult, the problem is feasibility due to computational cost.

    A simple way to solve: create an algorithm that generates all possible combinations and then check which varies less than the average, which can be done for example by checking which has the smallest quadratic deviation, which is something easy.

    The difficulty of this problem is due to its complexity that grows exponentially, that is, its computational cost increases greatly by increasing the size of the problem.

    I suggest you watch this video that explains it very well (although it is in Spanish is quite understandable). https://www.youtube.com/watch?v=UR2oDYZ-Sao

    It is obvious that in the solution I gave, you can simplify some things so you do not have to test all combinations and find a better solution, but the solution will remain exponential complexity.

    With 120 days and 4321 entries, the time to find the answer will be absurdly large, but it is the only way to ensure that the best distribution has been found.

    What can be done is to develop an idea for a solution nay ideal, where you will not find the best distribution, but that perhaps the result will be satisfactory for your goal, but it depends on your goal. If it was only the best solution, unfortunately there is not much to do, but if a non-ideal solution is also valid we can try some ideas. The question that remains is what your goal?

    • Thank you for your reply. I don’t mind computational time as it took me more than 6 hours running 3 scripts that extracted the entries from the encyclopedia. As stated in the question, my goal is to make a schedule that allows me to read equally a number of X characters each day. That means I will read some entries today, some tomorrow, but by the end of the number of days I will have read the three volumes. Only the detail is the number of combinations will be astronomical since I could read from 1 to several entries per day. I believe the script should be smarter.

    • 1

      Indeed, "creating an algorithm that generates all possible combinations" is a factor complexity solution that will very quickly become unviable as the number of entries grows. More information here: https://answall.com/questions/56836/defini%C3%A7%C3%a3o-da-nota%C3%A7%C3%a3o-big-o

    0

    Hello,

    Well the entries can be read randomly? example on the same day read entries of volume 1, 2 and 3 randomly?

    if you can’t Voce can do something like the logic below:

    Variaves do script:
    
    somaDeCaracteres = soma dos caracteres de todas os verbetes
    dias = numero de dias desejados
    media = media de caracteres diarios
    numeroDeVerbetes = quantidade de verbetes totais
    caracteresRestantesNoDia = ?
    
    Logica: 
    
    1 - 
    Para cada verbete em verbetes{
        somaDeCaracteres = somaDeCaracteres + verbete.caracteres
        numeroDeVerbetes = numeroDeVerbetes + 1
    }
    
    2 - 
    
    media = somaDeCaracteres  dias
    
    3 - 
    caracteresRestantesNoDia = media
    diaAtual = 1
    Para cada verbete em verbetes{
        se caracteresRestantesNoDia - verbete.caracteres > 0 {
            adicionaVerbeteNaLista(diaAtual)
            caracteresRestantesNoDia = caracteresRestantesNoDia - verbete.caracteres
        }senao{
            diaAtual = diaAtual + 1
            caracteresRestantesNoDia = media
        }
    }
    

    So he respects the media, in case an additional entry ever exceeds the media he does not add... in this way the last day will have characters unless the others.

    • Entries can be read in any order and in any volume, what matters is that they are distributed so that an almost equal amount of jails is read each day.

    • Is it valid to start reading one article in one day and finish the next? because if it is necessary to finish the whole entry the day it started I believe that it is not possible to divide in equal amounts....

    • If I read half an entry today and finished tomorrow it means I would have to count the number of characters before reading, which doesn’t make the task too practical.

    Browser other questions tagged

    You are not signed in. Login or sign up in order to post.