Create Java Index by generating HTML file?

Asked

Viewed 420 times

2

I needed to create a reference index with a book passed by parameter (File, BufferedReader).

So far I did not get good results, I have only a code that generates a TreeSet with all words of text passed by parameter. I’ve been trying for 3 weeks to make the code that takes the words and saves the lines where they appear and generates the HTML file of the index.

Read is a LineNumberReader, words is a TreeSet.

I have found problems when going through the list generated by the split method and compare with the text word by word (this is the code that I can not elaborate).

    while((line = read.readLine()) != null){
        line = line.replaceAll("[^a-zA-Z]", " ").toLowerCase();
        split = line.split(" ");

        for(String s : split){
            if(s.length() >= 1 && !palavras.contains(s)){
                palavras.add(s);
            }
        }           
    }

    path.close();
    read.close();

    }catch(FileNotFoundException e){
        e.getStackTrace();
        System.out.println("Caminho para o arquivo invalido!");

    }catch(IOException ex){
        ex.getStackTrace();
    }

    return palavras;  
}
  • Did you find a solution? Poste as an answer to help other people.

1 answer

2

Your code is almost there, I just think for you to get what you want, it would help a lot to change the data structure "words" to a java.util.Map, instead of a java.util.Set. The point is that you don’t want to keep just the words, but the relationship between each word and a list of lines (i.e., a list of integers). In this way, I have redefined "words" as follows:

Map<String,Set<Integer>> palavras = new HashMap<String, Set<Integer>>();

With this structure you can save relationships like:

  • "bla" -> [1,3]
  • "ble" -> [2]

That is, the word "bla" was found on line 1 and 3 while the word "ble" was found on line 2. With this, I changed its "for" to add a new entry on the map if the word is not already there, and add only the page if it already exists:

for(String s : split){
    if(!palavras.keySet().contains(s)){
      Set<Integer> linhas = new TreeSet<Integer>();
          linhas.add(read.getLineNumber());
          palavras.put(s, linhas);
      } else {
          palavras.get(s).add(read.getLineNumber());
      }
}

Does it help? If you need further clarification just ask in the comments.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.