In summary, the Htmlunit has an API that allows Java applications to perform the same actions that a user would perform in the browser, some examples include invoking a web page, clicking buttons and/or links, filling in forms...
Roughly it is a browser without the graphical interface the persons responsible for the project so-called as Features and other information can be found on the project page.
Example
Consider a page access http://meusiteficticio.com
that has a form on the page with this structure:
<form id='form-login' action='/login' method='post'>
<input name='user' type='text' placeholder='Nome de usuário'/>
<input name='pass' type='password' placeholder='Senha'/>
<input type='submit' value='entrar'/>
</form>
Through the browser, the user would enter a username and password in the appropriate fields, then click the button to submit the form. We will do the same but within the application.
They implemented (v2.8) and made public (v2.11) the methods querySelector
and querySelectorAll
that work similar to the functions that exist in Javascript. To get the same result of the previous code with these methods the code can look like this:
// Obtém a página de login.
HtmlPage paginaDeLogin = new WebClient(BrowserVersion.BEST_SUPPORTED)
.getPage("http://meusiteficticio.com");
// Obtém os elementos do formulário.
HtmlTextInput inputNomeDeUsuario = paginaDeLogin.querySelector("input[name='user']");
HtmlPasswordInput inputSenha = paginaDeLogin.querySelector("input[name='pass']");
HtmlSubmitInput botaoEnviar = paginaDeLogin.querySelector("#form-login > input[type='submit']");
// Define o valor do atributo 'value' dos inputs.
inputNomeDeUsuario.setValueAttribute("joao");
inputSenha.setValueAttribute("joao1234");
// Simula o "click" no botão de submit e aguarda retorno
HtmlPage paginaAposOLogin = botaoEnviar.click();
// Mostra o código html da página
System.out.println(paginaAposOLogin.asXml());
If you are using an older version that does not support querySelector
, you will first have to get the form and then go picking up the inputs through the method getInputByName
:
// Simulando um navegador Chrome.
WebClient client = new WebClient(BrowserVersion.CHROME);
// Obtém a página.
HtmlPage paginaDeLogin = client.getPage("http://meusiteficticio.com");
// Obtém o formulário de login pelo atributo "id" no html.
// O segundo parâmetro é para aceitar case-sensitive
// e.g "FoRm-LoGiN" também encontraria o formulário.
HtmlForm formularioDeLogin = paginaDeLogin.getElementById("form-login", true);
// Obtém o inputs (do formulário) pelo atributo "name":
HtmlTextInput inputNomeDeUsuario = formularioDeLogin.getInputByName("user");
HtmlPasswordInput inputSenha = formularioDeLogin.getInputByName("pass");
// O "botão" de submit não possui name, id, class, etc.
// Então uma forma de obtê-lo é pelo "value='entrar".
HtmlSubmitInput botaoEnviar = formularioDeLogin.getInputByValue("entrar");
// Insere os valores nos campos de nome de usuário e senha
// (como se estivesse digitando pelo navegador)
inputNomeDeUsuario.setValueAttribute("joao");
inputSenha.setValueAttribute("joao1234");
// Simula o "click" no botão de submit e aguarda retorno
HtmlPage paginaAposOLogin = botaoEnviar.click();
// Mostra o código html da página
System.out.println(paginaAposOLogin.getWebResponse().getContentAsString());
Be legal, handle the exceptions. Trying to insert (or even manipulate) a value into a non-existent input will launch a NullPointerException
.
Keeping the cookies
If you need to keep the cookies for use in the next requests you must define a CookieManager
for your "browser" WebClient
.
WebClient client = new WebClient(BrowserVersion.FIREFOX_24);
CookieManager cookieManager = client.getCookieManager();
cookieManager.setCookiesEnabled(true);
client.setCookieManager(cookieManager);
HtmlPage fb = client.getPage("https://facebook.com");
Disabling Warnings and Warnings
Htmlunit will display all warnings that invalidate the Html document, for example, obsolete attributes, errors in Javascript code and CSS as seen in this image:
You can turn off these alerts by setting Htmlunit logger level as OFF
:
Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);