-1
Since last night I try to make an algorithm to search line by line and return only links, but it’s complicated, either I’m dumb or the class methods String
, that I know, doesn’t help much.
-1
Since last night I try to make an algorithm to search line by line and return only links, but it’s complicated, either I’m dumb or the class methods String
, that I know, doesn’t help much.
2
Using the Jsoup library this is quite simple.
The very documentation brings the following example:
File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements links = doc.select("a[href]"); // a with href
The third argument of the method parse
with the value http://example.com/
serves to define a base URL to resolve relative Urls.
From this example you can use any selector to search through the links.
There is yet another most complete example that returns various types of URL found on the page, including scripts, styles and images:
public class ListLinks {
public static void main(String[] args) throws IOException {
Validate.isTrue(args.length == 1, "usage: supply url to fetch");
String url = args[0];
print("Fetching %s...", url);
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
Elements media = doc.select("[src]");
Elements imports = doc.select("link[href]");
print("\nMedia: (%d)", media.size());
for (Element src : media) {
if (src.tagName().equals("img"))
print(" * %s: <%s> %sx%s (%s)",
src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"),
trim(src.attr("alt"), 20));
else
print(" * %s: <%s>", src.tagName(), src.attr("abs:src"));
}
print("\nImports: (%d)", imports.size());
for (Element link : imports) {
print(" * %s <%s> (%s)", link.tagName(),link.attr("abs:href"), link.attr("rel"));
}
print("\nLinks: (%d)", links.size());
for (Element link : links) {
print(" * a: <%s> (%s)", link.attr("abs:href"), trim(link.text(), 35));
}
}
private static void print(String msg, Object... args) {
System.out.println(String.format(msg, args));
}
private static String trim(String s, int width) {
if (s.length() > width)
return s.substring(0, width-1) + ".";
else
return s;
}
}
Sorry for the delay, I ran out of net and I forgot the question. I had solved it in another way, but thanks guys. I’ll use that too. The tricky thing is that after learning Java you have to keep looking at the documentation to understand it. Thanks right there.
@D3ll4ry Worse than that. If only to look at the documentation, this would be good practice. The problem is to know zilhoes of libraries.
Browser other questions tagged java
You are not signed in. Login or sign up in order to post.
Add part of the code you tried and also how is the file where you are.
– user28595
Are you creating a java class to read some html file to search for <a href? It was not clear what you intended to do, put the code to exemplify your problem
– Gedson Silva
Line by line of what? Source code of what? Give input and output examples to make your question clearer.
– Pablo Almeida
I don’t remember how I did it right, because I deleted it... but here’s the thing, I was using the charAT() and index method to go through the line and find http://... just wanted it to show all http://.... of any web page
– Marcelo
Why the tag "java-web" ?
– Pablo Almeida
I thought it might be related to the question
– Marcelo
Please edit your question by placing an example of input text and the expected output for that example. It’s still a bit confusing.
– Pablo Almeida
As people have signaled, this question is unclear. Do you want to pick up urls? anchors (
<a>
)? Is it an html file, txt or what? Try to edit by answering these questions and putting what you have already done of code, to have a more accurate answer. (:– Felipe Avelar