8
I’m with a project that "recovers" certain information from an HTML page, makes a parse with the help of Beautiful Soup and return the values in the form of Dictionary, so that in another method I generate a JSON object.
The problem is that due to the peculiarity of the page, very poorly written and with too many tags, in addition to problems with the organization of the information itself, I need to treat everything with a lot of use of ties and conditional. The current method has 91 lines.
I cannot logically separate these blocks of code into other methods, everything seems to me "part of the same operation". It gets even harder because they don’t seem to be useful in another situation either.
Does anyone have any suggestions of when and how I can split my code?
As an example, a method I made to "play", which shares the same problem (to make it less strange, I explain: it takes the information from a menu page of the UK of my university):
def parse_cardapios(self):
"""Interpreta as tabelas de cardápio no site do restaurante"""
pag = urllib2.urlopen(self.url + '/' + self.campus).read();
soup = BeautifulSoup(pag)
resultado = []
# Percorre as refeições e suas respectivas tabelas de cardápio
nomes_ref = soup.find('section', id='post-content').find_all('h2')
tabelas_card = soup.find('section', id='post-content').find_all('table')
for ref, tab in zip(nomes_ref, tabelas_card):
refeicao = tratar_chave(ref)
# Percorre todos os dias disponíveis
nome_colunas = tab.find_all('th')
linhas = tab.find_all('tr', class_=True)
for lin in linhas: # Cada linha é um dia diferente
dia_repetido = False # Para controlar a repetição
obj_refeicoes = {refeicao: {}}
obj_temp = {'data': '', 'refeicoes': {}}
# Percorre cada dado
celulas = lin.find_all('td')
for meta, dado in zip(nome_colunas, celulas):
meta = tratar_chave(meta)
dado = tratar_valor(dado)
if meta == 'data':
dado = dado.translate(None, 'aábcçdefghijklmnopqrstuvzwxyz- ,')
if not resultado:
obj_temp['data'] = dado
else:
for r in resultado:
if r['data'] == dado:
dia_repetido = True
r['refeicoes'].update(obj_refeicoes)
break
else:
obj_temp['data'] = dado
else:
obj_refeicoes[refeicao].update({meta: dado})
obj_temp['refeicoes'].update(obj_refeicoes)
if not dia_repetido:
resultado.append(obj_temp)
# Junta as refeições vegetarianas no mesmo cardápio que as outras
for r in resultado:
for s in r['refeicoes'].keys():
if '-vegetariano' in s:
veg = {}
for t in r['refeicoes'][s].keys():
if not '-vegetariano' in t:
veg.update({t + '-vegetariano': r['refeicoes'][s][t]})
else:
veg.update({t: r['refeicoes'][s][t]})
sem_sufixo = s.replace('-vegetariano', '')
r['refeicoes'][sem_sufixo].update(veg)
for u in r['refeicoes'].keys():
if '-vegetariano' in u:
del r['refeicoes'][u]
return dict({'campus': self.campus, 'dia-cardapio': resultado})
Post your code @Matheus
– Emerson Rocha
Maybe divide by what is being extracted in each part.
extractProductInfo()
,extractDescriptionBody()
,extractComments()
, etc. It may also be possible to write what you are doing with fewer lines. Hard to say without seeing the code.– Guilherme Bernal
If there is no code duplication (that is, if the operations performed within the method are not repeated anywhere else) it is not so pressurous to break into smaller methods. Were 1000 lines, maybe...
– epx
I agree that it would be good for you to post at least a portion of your code. Your question is very nice and relevant, but it is very broad and prone to receiving feedback as an answer. If you post the code you may get more direct answers about ways to improve it. :)
– Luiz Vieira
I added code for example (:
– Matheus
When he’s doing more than one thing.
– Renato Dinhani
@Renatodinhaniconceição, this is kind of consensus, right? : Also: "I cannot logically separate these blocks of code into other methods, everything seems to me "part of the same operation". It gets even harder because they don’t seem to be useful in another situation either."
– Matheus