How to do this regular expression in python 3.6

Asked

Viewed 113 times

3

I need to do a regular expression to extract the links from this string :

links =('href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=70>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO A</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=71>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO B</a></li>

The string is much larger. I only put one part because the rest repeats. Here’s what I’ve tried :

campus1 = re.findall("href", links)
campus2 = re.findall("http", links)
campus3 = re.findall("href=http", links)
campus4 = re.findall("hre", links)
campus5 = re.findall("a", links)
campus6 = re.findall("<a> <\a>", links)

When I give a print or the letters come out separately or leave the link and these names( that later I’ll also have to think of an expression to get only these college names) Any ideas yet ? What comes out is this when I run campus1 = re.findall("href", links), for example: 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href'... That is, it returns all the "href’s" of the string. I would like to extract only the links, for example:

http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/? id_curso=01GV&id_grupo=70

All links so they are in this string.

  • That your string is incorrect... for the first li e o a are without the respective openings... And what exactly you want to extract, if possible edit with the fixes and an example of which output you want...

1 answer

0


Do so :

import re
s = "<li><a>href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=70>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO A</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=71>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO B</a></li>"
print(re.findall(r'href=[\'"]?([^\'" >]+)', s))

See on Ideone

Explanation of Regex(in English)

  • I know that we should not use comment to thank but I will do it anyway, because, if we do not inform that the business worked, both who answered and who comes in this question for answers will know that it worked !!!!!!!!!! So stay here my thank you, gave it right. vlw !

  • the best were to do this is to validate the answer by clicking on the green icon below the evaluation arrows of the chosen answer.. Vlw

Browser other questions tagged

You are not signed in. Login or sign up in order to post.