Algorithm for distributing entries from an encyclopedia based on number of characters per day

Question

Algorithm for distributing entries from an encyclopedia based on number of characters per day

Asked 8 years, 3 months ago

Viewed 414 times

12

I managed a list of entries from an encyclopedia and I want to find the best way to separate the entries so that I choose a number of days and I have a schedule of which entries to read per day.

(The list is here: https://pastebin.com/4qNLqCJj)

The first item is the volume, the second entry, the third character number and the fourth is an id (generated from the date the entry was recorded on my pc).

I would like to make a script that generates a schedule, I would enter the number of days and it would divide the total number of characters of the encyclopedia and separate the entries from the number of characters of the average.

Example: I’ll enter the number of days, 120. The script will generate a script so that I can read all the characters of the three volumes of the encyclopedia in 120 days. For example:

The program should have 120 days, and each day will have an X number of entries having their number of characters summed equaling a number as close as possible to the average of characters per day.

Entries can be read in any order and in any volume, what matters is that they are distributed so that an almost equal amount of jails is read each day.

I asked an earlier question related to this and a user suggested the prediction error algorithm, but I’m not able to adapt it to this script.

Behold: What is the algorithm for distributing paragraphs?

How could I do that?

If I understand correctly, what you want to do involves these steps: 1) Calculate the average of characters per day needed for the entire encyclopedia to be read in 120 days; 2) Group entries per day, so that each day has entries that make up approximately the same number of characters as the average. If this understanding is correct, you probably know how to do (1) but are having trouble doing it (2).

– Luiz Vieira

2017/05/20 at 14:41
2

A "simple" solution is to sort the entries from the smallest to the largest (that is, fewer characters for more characters) and allocate them on days until they exceed the calculated average in (1). Of course there may be entries that alone already exceed the average or that cannot be combined with any other without surpassing it, so in that case you have to use some other heuristic to make the best division. Better division algorithms are more complex and relate to the famous Backpack problem. Maybe this reading will help you.

– Luiz Vieira

2017/05/20 at 14:45

3 answers

2

I haven’t read the other answers, and I don’t know any "smart" way to resolve your question. Having said that, I found the problem interesting, and I tried to solve "in my own way" (under the nail).

I imagine that you don’t want/can break an entry in half - if not you could make a division "perfect" and would not have opened a question.

I say this because I realized that you have entries with 70 characters - and others with 7000, which makes it difficult (I believe) a good distribution.

What the code I wrote intends to do is this::

Fill in the desired number of days with the largest entries found

Search for days whose character sum is farthest from average

Add new entries to these days

And this is it. In my tests, the result seems reasonable, but I confess that I took so much of the programming itself that I could not test very much the possible solutions!

The program expects a text file, with identical formatting to the one you posted on Pastebin - information on each entry separated by semicolon, and entries separated from each other by a new line:

vol1;Ageu, Livro de;4629;1494553563.48
vol1;Agricultura;6593;1494553566.18
vol1;Água;8947;1494553571.57
vol1;Águia;10794;1494553577.0
vol1;Aguilhada;1688;1494553582.39
vol1;Agulha;580;1494553585.11
vol1;Agulha, Orifício da;1418;1494553587.82

Upshot:

"use strict";
function mostraDados() {
	if (window.carregado) {
		document.getElementById("qtdcaracteres").value = window.qtdcaracteres;
		document.getElementById("qtdverbetes").value = window.qtdverbetes;
		if (document.getElementById("qtddias").value > window.qtdverbetes) {
			document.getElementById("qtddias").value = window.qtdverbetes;
		}	
		var qtddias = document.getElementById("qtddias").value;
		document.getElementById("qtdmedia").value = Math.round(window.qtdcaracteres/qtddias);
		window.qtdmedia = Math.round(window.qtdcaracteres/qtddias);
	}
}

function saida(div, texto) {
	div.innerHTML = div.innerHTML + texto;	
}

function limpa(div) {
	div.innerHTML = "";	
}
function criaVerbetes(meuArray) {
		var obj = new Array();
		for (var i = 0; i < meuArray.length; i++) {
			obj[i] = {
				vol: (meuArray[i].split(";"))[0],
				titulo: (meuArray[i].split(";"))[1],
				caracteres: (meuArray[i].split(";"))[2],
				id: (meuArray[i].split(";"))[3],
				pegaCaracteres: function() { return parseInt(this.caracteres); },
			}
		}
	return obj;
}

function contaCaracteres(meuArray) {
		var qtdCaracteres = 0;
		for (var i = 0; i < meuArray.length; i++) {
			qtdCaracteres += meuArray[i].pegaCaracteres();
		}
	return qtdCaracteres;
}

function ordena(meuArray) {
		meuArray.sort(function(a, b) {
			return a.pegaCaracteres() - b.pegaCaracteres();
		});
}

function ordenaId(meuArray) {
		meuArray.sort(function(a, b) {
			return a.id - b.id;
		});
}

function listaVerbetes(meuArray) {
	for (var i = 0; i < meuArray.length; i++) {
		console.log(meuArray[i].titulo + ": " + meuArray[i].pegaCaracteres());
	}
}

function criaArrayDeDias() {
	var dias = document.getElementById("qtddias").value;
	var meuArray = new Array();
	
	for (var i = 0; i < dias; i++) {
		meuArray[i] = {
			verbete: [],
			pegaSoma: function() {
				var ct=0;
				for (var i = 0; i < this.verbete.length; i++) {
					ct += this.verbete[i].pegaCaracteres();
				}
				return ct;
			}
		}
	}

	return meuArray;
}

function calcula() {
	if (window.carregado) {
		window.objDia = criaArrayDeDias();
		window.colecaoVerbetes = criaVerbetes(arrayVerbetes);
    
                ordena(colecaoVerbetes);

		for (var i = 0; i < window.objDia.length; i++) {
			window.objDia[i].verbete[window.objDia[i].verbete.length] = window.colecaoVerbetes.pop();
		}
		
		var longeDaMedia = 0;
		var index;
		while(window.colecaoVerbetes.length > 0) {
			for (var i = 0; i < window.objDia.length; i++) {
			var provisorio = window.qtdmedia - window.objDia[i].pegaSoma();
				if (longeDaMedia < provisorio) {
					longeDaMedia = provisorio;
					index = i;
				}			
			}
			window.objDia[index].
				verbete[window.objDia[index].
					verbete.length] = window.colecaoVerbetes.pop();
			longeDaMedia = 0;
		}

		// saiu do while! hora de morfar
		var div = document.getElementById('saida');
		var div2 = document.getElementById('saida2');
		for (var i = 0; i < window.objDia.length; i++) {
				ordenaId(window.objDia[i].verbete);				
		}

		limpa(div);
		for (var z = 0; z < window.objDia.length; z++) {
			saida(div, "Dia " + z + ":<br>");
			saida(div, "soma de caracteres: " + window.objDia[z].pegaSoma() + "<br>");
			saida(div, "numero de verbetes: " + window.objDia[z].verbete.length + "<br><br>");
		}


		var div = document.getElementById('saida');
		limpa(div2);
				for (var z = 0; z < window.objDia.length; z++) {
			saida(div2, "Dia " + z + ":<br>");
			for (var g = 0; g < window.objDia[z].verbete.length; g++) {
				saida(div2, window.objDia[z].verbete[g].vol + ", " +
					window.objDia[z].verbete[g].titulo + "<br>");
			}
			saida(div2, "<br>");
		}

	} else { console.log("nao carregado"); }
}

function leArquivo(e) {
	var arquivo = e.target.files[0];
	var leitor = new FileReader();
	leitor.onload = function(e) {
		window.arquivoTexto = e.target.result;
		window.arrayVerbetes = arquivoTexto.split("\n");
		window.carregado = true;
		window.objDia = criaArrayDeDias();
		window.objVerbetes = criaVerbetes(arrayVerbetes);
		window.colecaoVerbetes = window.objVerbetes;
		window.qtdverbetes = window.objVerbetes.length;
		window.qtdcaracteres = contaCaracteres(objVerbetes);
		mostraDados();
	};
leitor.readAsText(arquivo);
}

document.getElementById('arquivo').addEventListener('change', leArquivo, false);
document.getElementById('qtddias').addEventListener('change', mostraDados, false);

label {display:block;}
textarea {display:block;}

.container{
    font-family: Tahoma, Verdana, Segoe, sans-serif;
    padding: 10px;
    background-color:#d6d6c2;
    display:flex;
}
.container2{
    color: #000;
    height:400px;
    overflow-y:scroll;
}
.divide{
    padding: 10px;
    flex-grow: 1;
}

<!DOCTYPE html>
<html>
<body> 
<head>
<meta charset="utf-8"/>
</head>
<body>
<div class="container">
	<div class="divide">
			<label for="title">Quantidade de verbetes</label>
			<textarea id="qtdverbetes" rows="2" cols="15" readonly></textarea>
			<label for="title">Quantidade de caracteres</label>
			<textarea id="qtdcaracteres" rows="2" cols="15" readonly></textarea>
			<label for="title">Media caracteres/dia</label>
			<textarea id="qtdmedia" rows="2" cols="15" readonly></textarea>
		</div>
		<div class="divide">
			<input type="file" id="arquivo"/><hr>
			<label for="title">Numero de dias</label>
			<textarea id="qtddias" rows="2" cols="15"></textarea>
			<hr>
			<button onclick="calcula()">Calcular</button>
		</div>
		</div>
	<div class="container">
		<div class="divide">
			<p>Contagem de caracteres:</p>
			<div class="container2" id="saida"></div>
		</div>
		<div class="divide">
			<p>Verbetes:</p>
			<div class="container2" id="saida2"></div>
		</div>
<script src=".\verbetes.js">
</script>
</body>
</html>

Browser other questions tagged algorithm

You are not signed in. Login or sign up in order to post.

by Lucas Trigueiro • **664** points · Answer 1 · 2017-05-19T18:13:58+00:00

Solving this problem is not difficult, the problem is feasibility due to computational cost.

A simple way to solve: create an algorithm that generates all possible combinations and then check which varies less than the average, which can be done for example by checking which has the smallest quadratic deviation, which is something easy.

The difficulty of this problem is due to its complexity that grows exponentially, that is, its computational cost increases greatly by increasing the size of the problem.

I suggest you watch this video that explains it very well (although it is in Spanish is quite understandable). https://www.youtube.com/watch?v=UR2oDYZ-Sao

It is obvious that in the solution I gave, you can simplify some things so you do not have to test all combinations and find a better solution, but the solution will remain exponential complexity.

With 120 days and 4321 entries, the time to find the answer will be absurdly large, but it is the only way to ensure that the best distribution has been found.

What can be done is to develop an idea for a solution nay ideal, where you will not find the best distribution, but that perhaps the result will be satisfactory for your goal, but it depends on your goal. If it was only the best solution, unfortunately there is not much to do, but if a non-ideal solution is also valid we can try some ideas. The question that remains is what your goal?

by João • **164** points · Answer 2 · 2017-05-17T01:59:37+00:00

Hello,

Well the entries can be read randomly? example on the same day read entries of volume 1, 2 and 3 randomly?

if you can’t Voce can do something like the logic below:

Variaves do script:

somaDeCaracteres = soma dos caracteres de todas os verbetes
dias = numero de dias desejados
media = media de caracteres diarios
numeroDeVerbetes = quantidade de verbetes totais
caracteresRestantesNoDia = ?

Logica: 

1 - 
Para cada verbete em verbetes{
    somaDeCaracteres = somaDeCaracteres + verbete.caracteres
    numeroDeVerbetes = numeroDeVerbetes + 1
}

2 - 

media = somaDeCaracteres  dias

3 - 
caracteresRestantesNoDia = media
diaAtual = 1
Para cada verbete em verbetes{
    se caracteresRestantesNoDia - verbete.caracteres > 0 {
        adicionaVerbeteNaLista(diaAtual)
        caracteresRestantesNoDia = caracteresRestantesNoDia - verbete.caracteres
    }senao{
        diaAtual = diaAtual + 1
        caracteresRestantesNoDia = media
    }
}

So he respects the media, in case an additional entry ever exceeds the media he does not add... in this way the last day will have characters unless the others.