How to pick up only links in a source code?

Asked

Viewed 1,818 times

-1

Since last night I try to make an algorithm to search line by line and return only links, but it’s complicated, either I’m dumb or the class methods String , that I know, doesn’t help much.

  • Add part of the code you tried and also how is the file where you are.

  • Are you creating a java class to read some html file to search for <a href? It was not clear what you intended to do, put the code to exemplify your problem

  • Line by line of what? Source code of what? Give input and output examples to make your question clearer.

  • I don’t remember how I did it right, because I deleted it... but here’s the thing, I was using the charAT() and index method to go through the line and find http://... just wanted it to show all http://.... of any web page

  • Why the tag "java-web" ?

  • I thought it might be related to the question

  • Please edit your question by placing an example of input text and the expected output for that example. It’s still a bit confusing.

  • As people have signaled, this question is unclear. Do you want to pick up urls? anchors (<a>)? Is it an html file, txt or what? Try to edit by answering these questions and putting what you have already done of code, to have a more accurate answer. (:

Show 3 more comments

1 answer

2


Using the Jsoup library this is quite simple.

The very documentation brings the following example:

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements links = doc.select("a[href]"); // a with href

The third argument of the method parse with the value http://example.com/ serves to define a base URL to resolve relative Urls.

From this example you can use any selector to search through the links.

There is yet another most complete example that returns various types of URL found on the page, including scripts, styles and images:

public class ListLinks {
    public static void main(String[] args) throws IOException {
        Validate.isTrue(args.length == 1, "usage: supply url to fetch");
        String url = args[0];
        print("Fetching %s...", url);

        Document doc = Jsoup.connect(url).get();
        Elements links = doc.select("a[href]");
        Elements media = doc.select("[src]");
        Elements imports = doc.select("link[href]");

        print("\nMedia: (%d)", media.size());
        for (Element src : media) {
            if (src.tagName().equals("img"))
                print(" * %s: <%s> %sx%s (%s)",
                        src.tagName(), src.attr("abs:src"), src.attr("width"), src.attr("height"),
                        trim(src.attr("alt"), 20));
            else
                print(" * %s: <%s>", src.tagName(), src.attr("abs:src"));
        }

        print("\nImports: (%d)", imports.size());
        for (Element link : imports) {
            print(" * %s <%s> (%s)", link.tagName(),link.attr("abs:href"), link.attr("rel"));
        }

        print("\nLinks: (%d)", links.size());
        for (Element link : links) {
            print(" * a: <%s>  (%s)", link.attr("abs:href"), trim(link.text(), 35));
        }
    }

    private static void print(String msg, Object... args) {
        System.out.println(String.format(msg, args));
    }

    private static String trim(String s, int width) {
        if (s.length() > width)
            return s.substring(0, width-1) + ".";
        else
            return s;
    }
}
  • Sorry for the delay, I ran out of net and I forgot the question. I had solved it in another way, but thanks guys. I’ll use that too. The tricky thing is that after learning Java you have to keep looking at the documentation to understand it. Thanks right there.

  • @D3ll4ry Worse than that. If only to look at the documentation, this would be good practice. The problem is to know zilhoes of libraries.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.