How to do this regular expression in python 3.6

Question

How to do this regular expression in python 3.6

Asked 8 years, 5 months ago

Viewed 113 times

3

I need to do a regular expression to extract the links from this string :

links =('href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=70>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO A</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=71>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO B</a></li>

The string is much larger. I only put one part because the rest repeats. Here’s what I’ve tried :

campus1 = re.findall("href", links)
campus2 = re.findall("http", links)
campus3 = re.findall("href=http", links)
campus4 = re.findall("hre", links)
campus5 = re.findall("a", links)
campus6 = re.findall("<a> <\a>", links)

When I give a print or the letters come out separately or leave the link and these names( that later I’ll also have to think of an expression to get only these college names) Any ideas yet ? What comes out is this when I run campus1 = re.findall("href", links), for example: 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href'... That is, it returns all the "href’s" of the string. I would like to extract only the links, for example:

http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/? id_curso=01GV&id_grupo=70

All links so they are in this string.

That your string is incorrect... for the first li e o a are without the respective openings... And what exactly you want to extract, if possible edit with the fixes and an example of which output you want...

– MagicHat

2017/02/26 at 14:40

1 answer

Browser other questions tagged python python-3.x python-2.7

You are not signed in. Login or sign up in order to post.

by MagicHat • **12,262** points · Answer 1 · 2017-02-26T15:45:40+00:00

Do so :

import re
s = "<li><a>href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=70>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO A</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=71>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO B</a></li>"
print(re.findall(r'href=[\'"]?([^\'" >]+)', s))

See on Ideone

Explanation of Regex(in English)