How to remove a link from a string in Javascript?

Asked

Viewed 119 times

7

How can I remove the link from a Javascript string?

str = "hey olha isso http://google.com, legal né?"
var test = str.description.replace(/.*?:///g, "");

Expected result:

hey look at that, cool right?

3 answers

4


you can use this regular expression.

var urlPattern = /(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?/g;
var textoComUrl = "hey olha isso http://google.com/, legal né?";
var textoSemUrl = textoComUrl.replace(urlPattern, "");
console.log(textoSemUrl);

3

var test = test.description.replace(/.*?:\/\//g, "");

1

For the question URL, the other answers serve well, but remember that a URL can be much more complicated than the examples cited.

For example, it can be something like https://www.abc.com/xyz/a,b?x=aa%20b&y=1,2 or http://www.abc.com/xyz#teste-bla (notice that the comma may appear in some parts of the URL). And try to make a regex that matches any valid URL it’s pretty complicated.

Instead of trying to make a monstrous regex (like the link already indicated), I suggest trying a simpler one, which extracts the passages that seem like a URL, and then you validate using the interface URL (which is made for this). If it is valid, then you remove it. Something like this:

var str = "hey olha isso http://google.com, essa é inválida: http:///// - essa é válida: https://www.abc.com/xyz/a,b?x=aa%20b&y=1,2, essa também http://www.abc.com/xyz#teste-bla e http://127.0.0.1:8080/app legal né?";
var semURL = str.replace(/https?:\/\/([^ ,]|,(?! ))+/g, function (match) {
    try {
        // se for válida, não cai no catch
        new URL(match);
        return ''; // remove a URL da string
    } catch {
        // não é uma URL válida, não faz a substituição
        return match;
    }
});
console.log(semURL);

With that the result will be:

hey olha isso , essa é inválida: http:///// - essa é válida: , essa também  e  legal né?

I used a simpler regex to extract snippets that look like URL’s. In this case, it could be anything that starts with http or https (for the s? indicates that the letter "s" is optional). Then we have :// and then we have the most complicated part: [^ ,]|,(?! ).

The | indicates alternation, with two options:

  • [^ ,] - any character other than space or comma, or
  • ,(?! ) - a comma, as long as there is no space after

Thus, in the passage http://google.com, I do not remove the comma, and in the others I take everything up to the next space. Then, in the replace i use a function that analyzes the chunk that was found, and decides which replacement will be made. If it is a valid URL, I replace it with '' (empty string), which is the same as removing. If it is not a valid URL, I return the match found, that is, it will not be replaced and will remain in the original text.

The regex of another answer does not consider the passage from query string which is after the comma (which, remembering, is valid), not removing the hash #teste-bla of the last URL. And it also does not remove http://127.0.0.1:8080/app.

Of course, this regex can still fail. Since the comma is a valid character in URL’s, it may actually be the last one, then it would have to be removed (but there would be no way to know if it is part of the link or not). Anyway, it all depends on the cases you have.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.