How to assemble logic to read 2 files, compare them and extract values not found

Asked

Viewed 1,093 times

3

I have 2 files .txt, one of them is a correct list of cities (contain all the cities of the country, written correctly) and the other, also a list of cities but with some wrong data (this list suffered user insertions, so it has Portuguese errors, etc).

In order to speed up my correction process of the second list, I thought to check if each city of it is inserted in the first list (ie, if inserted means that the city is typed correctly, if you’re not, me I guard this city because supposedly it is a wrong datum).

My problem is logic, I built the following code but it seems to just go through the first line of file 2 (with wrong data). And I’m also wondering how to use the comparison, since I need to know all the values of file 1 to know if the city that is in the loop is in the archive or not.

import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;

public class Aehoo {
    public static void main(String[] args) throws IOException {
        Scanner biCities = new Scanner(new FileReader("C:\\LISTA_CIDADES_BI.txt"));
        Scanner billCities = new Scanner(new FileReader("C:\\LISTA_CIDADES_BILL_ADDR.txt"));
        ArrayList<String> array = new ArrayList<String>();

        // Percorre a lista de cidades com dados errados
        while (billCities.hasNextLine()) {
            String cityBill = billCities.nextLine();

            // Percore a lista de cidades correta para cada linha da outra lista
            // A fim de verificar se cityBill está na lista
            while (biCities.hasNextLine()) {
                String cityBi = biCities.nextLine();

                // Problema de lógica de comparação aqui
            }
        }

        for (String s : array) {
            System.out.println(s);
        }

        biCities.close();
        billCities.close();
    }
}

Cities are represented in the standard CIDADE;ESTADO as an example below.

LISTA_CIDADES_BILL_ADDR                 LISTA_CIDADES_BI
(LISTA C/ DADOS ERRADOS)                (LISTA COM DADOS CORRETOS)
=- LAURO DE FREITAS;BA                  ABADIA DE GOIAS;GO
; VILAS DO ATLANTICO;BA                 ABADIA DOS DOURADOS;MG
ABADIA DE GOIAS;GO                      ABADIANIA;GO
ABADIA DOS DOURADOS;MG                  ABAETE;MG
ABADIANIA;GO                            ABAETETUBA;PA
ABAETE;MG                               ABAIARA;CE
ABAETE DOS MENDES;MG                    ABAIRA;BA
ABAETETUBA;PA                           ABARE;BA
ABAIARA;CE                              ABATIA;PR
ABAIBA;MG                               ABDON BATISTA;SC

Just for the sake of information, I was able to come up with a logic that works, instead of reading the lists every time I go through mine while I saved in 2 array and set the condition below:

ArrayList<Cidade> cidadesDiferentes = new ArrayList<Cidade>();

for (Cidade cidadeIncorreta : listaCidadesIncorretas) {
    int encontrou = 0;

    for (Cidade cidadeCorreta : listaCidadesCorretas) {
        if ((cidadeIncorreta.getCidade().equalsIgnoreCase(cidadeCorreta.getCidade())) && (cidadeIncorreta.getEstado().equalsIgnoreCase(cidadeCorreta.getEstado()))) {
            encontrou = 1;
        }
    }

    if (encontrou == 0) {
        cidadesDiferentes.add(cidadeIncorreta);
    }
}
  • Does this process have to be done via code? It would be easier and simple to import into a database and use select and relating the tables.

  • Yes Jay, the 2 tables are from different databases and I no longer have access to the database with the correct data.

  • 1

    Just to give it a touch. With this logic you will have quadratic time in expensive disk reading operations. If the list of cities is not too large it is worth reading the two files only once (each for one Set, e. g., LinkedHashSet). To find the wrong entries just do setErrado.removeAll(setCorreto);

  • The cost is not very high neither reading on disk nor doing in memory because we have only 5,500 municipalities. In memory however it is much simpler. Besides this doesn’t look like an operation to be done 10 times a day, you can not care much about the cost. Load each file into a list and improve the comparison logic between list items.

  • That, one list has 5570 records and the other 8866, I will make this process only once in order to catch the city with error and check it in the bank.

2 answers

3

He’s only walking once because the Scanner of the wrong cities has reached the end of the file. You have to restart it.

Do something like that on the first while:

<Segundo While>
biCities.close();
biCities = new Scanner(new FileReader("C:\\LISTA_CIDADES_BI.txt"));
<fecha While>

Another option is to use the method reset() class Scanner, thus:

biCities.reset(); //Se você usar alguma das funções Scanner.useDelimiter(), Scanner.useLocale()
                  //ou Scanner.useRadix(), deve reutilizá-las.

2


On request follows a minimum sketch of the solution in memory with Set.

Read a file to a Set:

public Set<String> leMunicipios(Path path, int linhasParaPular, Charset charset) 
        throws IOException {
    final List<String> contents = Files.readAllLines(path, charset);
    return new LinkedHashSet<>(contents.subList(linhasParaPular, contents.size()));
}

List errors:

final Charset charset = StandardCharsets.UTF_8;
try {
    final Path pathMunicipios = Paths.get("C:\\LISTA_CIDADES_BILL_ADDR.txt");
    final Path pathGabarito = Paths.get("C:\\LISTA_CIDADES_BI.txt");
    final Set<String> municipios = leMunicipios(pathMunicipios, 4, charset);
    final Set<String> gabarito = leMunicipios(pathGabarito, 2, charset);
    municipios.removeAll(gabarito);
    municipios.forEach(System.out::println);
} catch (IOException e) {
    e.printStackTrace();
}

If the order is not important, you can improve the performance a little by replacing the LinkedHashSetby a HashSet (not that that will be very relevant in this case).

Browser other questions tagged

You are not signed in. Login or sign up in order to post.