3
I need to do a regular expression to extract the links from this string :
links =('href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=70>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO A</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=01GV&id_grupo=71>ADMINISTRAÇÃO - GOVERNADOR VALADARES - DIURNO - SISU - GRUPO B</a></li>
The string is much larger. I only put one part because the rest repeats. Here’s what I’ve tried :
campus1 = re.findall("href", links)
campus2 = re.findall("http", links)
campus3 = re.findall("href=http", links)
campus4 = re.findall("hre", links)
campus5 = re.findall("a", links)
campus6 = re.findall("<a> <\a>", links)
When I give a print or the letters come out separately or leave the link and these names( that later I’ll also have to think of an expression to get only these college names) Any ideas yet ? What comes out is this when I run campus1 = re.findall("href", links), for example: 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href', 'href'... That is, it returns all the "href’s" of the string. I would like to extract only the links, for example:
All links so they are in this string.
That your string is incorrect... for the first
li e o a
are without the respective openings... And what exactly you want to extract, if possible edit with the fixes and an example of which output you want...– MagicHat