Save a page from an Htmlpage (java)

Asked

Viewed 96 times

2

I have a method that generates an Htmlpage, I would like to save to disk.

public void gerarPaginaIndex() {
    try {
        final HtmlPage paginaIndex = WebClientFactory.getInstance().getPage(URL_INICIAL);
        this.criarPaginaEmDisco(paginaIndex, new File(PATH + "paginaIndex.html"));
    } catch (IOException e) {
        e.printStackTrace();
    }

}

The method creatPaginaEmDisco() receives by parameter the address to save and Htmlpage.

2 answers

1

The Htmlpage class has the method save that you can use in this situation:

public void gerarPaginaIndex() {
try {
    final HtmlPage paginaIndex = WebClientFactory.getInstance().getPage(URL_INICIAL);
    //Salva a página
    paginaIndex.save(new File(PATH + "paginaIndex.html"));
} catch (IOException e) {
    e.printStackTrace();
}

If you want to use the method child just encapsulate the call to the save

public void criaPaginaEmDisco(HtmlPage pagina, File arquivo) throws IOException {
    pagina.save(arquivo);
}
  • That’s cool! But there’s a way not to save images and other files?

  • To save only html it is better to save in the file the return of .toString(), or html bytes, as commented below.

1


Simplest way to save a file using Java 8 is as follows:

Files.write(Paths.get(PATH + "paginaIndex.html"), paginaIndex.asXml().toString().getBytes(Charset.forName("ISO‌​-8859-1")));
  • I tested it here and when I opened the file, I did not return the site itself, I did an inspection and the body was just Htmlpage and the url. I tried this: Files.write(Paths.get(PATH + "paginaIndex.html"), paginaIndex.asXml().toString().getBytes()); It worked, BUT it came with some special characters. ps.: sorry for anything, I’m learning about yet.

  • @Laryssa welcome to Stack Overflow. You can specify what would be the special characters?

  • Examples: General Information ... Â Â Â Â Â Â Â Â Â Â Â Â Â Â

  • I was able to save the page, but some words with accentuation or not, came with characters of this type above.

  • @Laryssa made a small change in the code that should cover the accent. Please check

  • Thanks @Sorack , but it didn’t work. Still with the characters

  • @Laryssa made one more change. Please check

  • This appeared and was in red: Unknown encoding: 'ISO 8859-1'.

  • @Laryssa was missing a trace. I corrected the answer

  • It worked, thank you very much! Only asXml() is needed to complement: Files.write(Paths.get(PATH + "paginaIndex.html"), paginaIndex.asXml(). toString(). getBytes(Charset.forName("ISO-8859-1")));

Show 5 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.