Expression to remove URL links from twitter tweet

Asked

Viewed 249 times

2

I wonder if anyone knows any expression to remove links that are present in a file . CSV in Python language.

Text ex:

Joao was in the market http://scikit-learn.org/stable/modules/genera I want that text to appear

With that I want the way out:

Joao was in the market I want that text to appear

I have this code. It removes the link but also the text that comes after

URLless_string = re.sub(r'\w+:\/{2}[\d\w-]+(\.[\d\w-]+)*(?:(?:\/[^\s/]*))*', '', str(linha))
print(str(URLless_string))
  • 2

    Why the code was removed from the question? It apparently worked.

1 answer

2


Do this:

import re

linha = raw_input("Entre com o tweet: ")
URLless_string = re.sub(r"http\S+", "", str(linha))
print(str(URLless_string))
  • http box with literal characters
  • \S+ box with all characters not blank (until end of url)
  • replaces with the empty string

Online example here.

  • What if the URL does not have the HTTP(S) schema? For example, FTP.

  • @Andersoncarloswoss I think the OP could specify to us in his question whether there will be links with different beginning of http

  • There will only be an http start

  • @thiagoxavier then the code I provided works

  • It worked in parts ! It does pretty much the same thing as it does on my question.Removes the URL but also removes the text that comes after.I think the code to parse this sentence has to take into account the whitespace and know that after this space is a normal text that should not be removed

  • @thiagoxavier updated the answer

  • It worked!!! Thank you

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.