1
I need to extract only the phrases that contain ADMINISTRATION - JUDGE OUTSIDE - NOCTURNE - SISU - GROUP B, for example. That is, I need to get only the name of the course, the city, the turn, the SISU and the group name of the following string:
string = </li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=46A&id_grupo=70>ADMINISTRAÇÃO - JUIZ DE FORA - NOTURNO - SISU - GRUPO A</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=46A&id_grupo=71>ADMINISTRAÇÃO - JUIZ DE FORA - NOTURNO - SISU - GRUPO B</a></li><li><a href=http://www.ufjf.br/cdara/sisu-2/sisu-2017-1a-edicao/lista-de-espera-sisu-3/?id_curso=46A&id_grupo=72>
The string is huge, that’s just a piece. I managed to make one but it’s returning stung things, and also, it’s not picking up accented letters, like for example the "oh" accented HISTORY. The expression I made was that
cursos = re.findall(([A-Z])\w+g)
I need you to get out of this :
ADMINISTRAÇÃO - JUIZ DE FORA - NOTURNO - SISU - GRUPO A
But she returns it to me:
GEOGRAFIA - JUIZ DE FORA - DIURNO - SISU - GRUPO( não está pegando qual grupo é)
and in HISTORY for example she does not get the "O" accented.
Can you also tell what url you’re fetching html sff from? It would be easier to help you and you
– Miguel
In this case I don’t need the url’s, it’s just the sentences. The urls I’ve already extracted with another expression because I need them in a separate place. It’s just the same sentences.
– SasukeUchiha
You NEED to use Python 2 for this ? It’s much better to use Python 3 - to start, you won’t have problems with accentuation.
– jsbueno
(your keyboard has no ' " '? are missing in both html snippet and Python code )
– jsbueno