3
I have 2 files .txt
, one of them is a correct list of cities (contain all the cities of the country, written correctly) and the other, also a list of cities but with some wrong data (this list suffered user insertions, so it has Portuguese errors, etc).
In order to speed up my correction process of the second list, I thought to check if each city of it is inserted in the first list (ie, if inserted means that the city is typed correctly, if you’re not, me I guard this city because supposedly it is a wrong datum).
My problem is logic, I built the following code but it seems to just go through the first line of file 2 (with wrong data). And I’m also wondering how to use the comparison, since I need to know all the values of file 1 to know if the city that is in the loop is in the archive or not.
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;
public class Aehoo {
public static void main(String[] args) throws IOException {
Scanner biCities = new Scanner(new FileReader("C:\\LISTA_CIDADES_BI.txt"));
Scanner billCities = new Scanner(new FileReader("C:\\LISTA_CIDADES_BILL_ADDR.txt"));
ArrayList<String> array = new ArrayList<String>();
// Percorre a lista de cidades com dados errados
while (billCities.hasNextLine()) {
String cityBill = billCities.nextLine();
// Percore a lista de cidades correta para cada linha da outra lista
// A fim de verificar se cityBill está na lista
while (biCities.hasNextLine()) {
String cityBi = biCities.nextLine();
// Problema de lógica de comparação aqui
}
}
for (String s : array) {
System.out.println(s);
}
biCities.close();
billCities.close();
}
}
Cities are represented in the standard CIDADE;ESTADO
as an example below.
LISTA_CIDADES_BILL_ADDR LISTA_CIDADES_BI
(LISTA C/ DADOS ERRADOS) (LISTA COM DADOS CORRETOS)
=- LAURO DE FREITAS;BA ABADIA DE GOIAS;GO
; VILAS DO ATLANTICO;BA ABADIA DOS DOURADOS;MG
ABADIA DE GOIAS;GO ABADIANIA;GO
ABADIA DOS DOURADOS;MG ABAETE;MG
ABADIANIA;GO ABAETETUBA;PA
ABAETE;MG ABAIARA;CE
ABAETE DOS MENDES;MG ABAIRA;BA
ABAETETUBA;PA ABARE;BA
ABAIARA;CE ABATIA;PR
ABAIBA;MG ABDON BATISTA;SC
Just for the sake of information, I was able to come up with a logic that works, instead of reading the lists every time I go through mine while
I saved in 2 array
and set the condition below:
ArrayList<Cidade> cidadesDiferentes = new ArrayList<Cidade>();
for (Cidade cidadeIncorreta : listaCidadesIncorretas) {
int encontrou = 0;
for (Cidade cidadeCorreta : listaCidadesCorretas) {
if ((cidadeIncorreta.getCidade().equalsIgnoreCase(cidadeCorreta.getCidade())) && (cidadeIncorreta.getEstado().equalsIgnoreCase(cidadeCorreta.getEstado()))) {
encontrou = 1;
}
}
if (encontrou == 0) {
cidadesDiferentes.add(cidadeIncorreta);
}
}
Does this process have to be done via code? It would be easier and simple to import into a database and use
select
and relating the tables.– Jothaz
Yes Jay, the 2 tables are from different databases and I no longer have access to the database with the correct data.
– Eduardo Silva
Just to give it a touch. With this logic you will have quadratic time in expensive disk reading operations. If the list of cities is not too large it is worth reading the two files only once (each for one
Set
, e. g.,LinkedHashSet
). To find the wrong entries just dosetErrado.removeAll(setCorreto);
– Anthony Accioly
The cost is not very high neither reading on disk nor doing in memory because we have only 5,500 municipalities. In memory however it is much simpler. Besides this doesn’t look like an operation to be done 10 times a day, you can not care much about the cost. Load each file into a list and improve the comparison logic between list items.
– Caffé
That, one list has 5570 records and the other 8866, I will make this process only once in order to catch the city with error and check it in the bank.
– Eduardo Silva