Refactoring: When is a method "too big"?

Asked

Viewed 436 times

8

I’m with a project that "recovers" certain information from an HTML page, makes a parse with the help of Beautiful Soup and return the values in the form of Dictionary, so that in another method I generate a JSON object.

The problem is that due to the peculiarity of the page, very poorly written and with too many tags, in addition to problems with the organization of the information itself, I need to treat everything with a lot of use of ties and conditional. The current method has 91 lines.

I cannot logically separate these blocks of code into other methods, everything seems to me "part of the same operation". It gets even harder because they don’t seem to be useful in another situation either.

Does anyone have any suggestions of when and how I can split my code?

As an example, a method I made to "play", which shares the same problem (to make it less strange, I explain: it takes the information from a menu page of the UK of my university):

def parse_cardapios(self):

    """Interpreta as tabelas de cardápio no site do restaurante"""

    pag = urllib2.urlopen(self.url + '/' + self.campus).read();
    soup = BeautifulSoup(pag)
    resultado = []

    # Percorre as refeições e suas respectivas tabelas de cardápio

    nomes_ref = soup.find('section', id='post-content').find_all('h2')
    tabelas_card = soup.find('section', id='post-content').find_all('table')

    for ref, tab in zip(nomes_ref, tabelas_card):

        refeicao = tratar_chave(ref)

        # Percorre todos os dias disponíveis

        nome_colunas = tab.find_all('th')
        linhas = tab.find_all('tr', class_=True)

        for lin in linhas: # Cada linha é um dia diferente

            dia_repetido = False # Para controlar a repetição

            obj_refeicoes = {refeicao: {}}
            obj_temp = {'data': '', 'refeicoes': {}}

            # Percorre cada dado

            celulas = lin.find_all('td')

            for meta, dado in zip(nome_colunas, celulas):

                meta = tratar_chave(meta)
                dado = tratar_valor(dado)

                if meta == 'data':

                    dado = dado.translate(None, 'aábcçdefghijklmnopqrstuvzwxyz- ,')

                    if not resultado:
                        obj_temp['data'] = dado

                    else:
                        for r in resultado:
                            if r['data'] == dado:
                                dia_repetido = True
                                r['refeicoes'].update(obj_refeicoes)
                                break

                        else:
                            obj_temp['data'] = dado

                else:
                    obj_refeicoes[refeicao].update({meta: dado})
                    obj_temp['refeicoes'].update(obj_refeicoes)

            if not dia_repetido:
                resultado.append(obj_temp)

    # Junta as refeições vegetarianas no mesmo cardápio que as outras

    for r in resultado:
        for s in r['refeicoes'].keys():
            if '-vegetariano' in s:
                veg = {}
                for t in r['refeicoes'][s].keys():
                    if not '-vegetariano' in t:
                        veg.update({t + '-vegetariano': r['refeicoes'][s][t]})

                    else:
                        veg.update({t: r['refeicoes'][s][t]})

                sem_sufixo = s.replace('-vegetariano', '')
                r['refeicoes'][sem_sufixo].update(veg)

        for u in r['refeicoes'].keys():
            if '-vegetariano' in u:
                del r['refeicoes'][u]

    return dict({'campus': self.campus, 'dia-cardapio': resultado})
  • 2

    Post your code @Matheus

  • Maybe divide by what is being extracted in each part. extractProductInfo(), extractDescriptionBody(), extractComments(), etc. It may also be possible to write what you are doing with fewer lines. Hard to say without seeing the code.

  • 2

    If there is no code duplication (that is, if the operations performed within the method are not repeated anywhere else) it is not so pressurous to break into smaller methods. Were 1000 lines, maybe...

  • I agree that it would be good for you to post at least a portion of your code. Your question is very nice and relevant, but it is very broad and prone to receiving feedback as an answer. If you post the code you may get more direct answers about ways to improve it. :)

  • I added code for example (:

  • 1

    When he’s doing more than one thing.

  • 1

    @Renatodinhaniconceição, this is kind of consensus, right? : Also: "I cannot logically separate these blocks of code into other methods, everything seems to me "part of the same operation". It gets even harder because they don’t seem to be useful in another situation either."

Show 2 more comments

5 answers

9


This question is very relative. There are authors who recommend 20 lines, but tolerate up to 100. Others claim that some methods can indeed consume up to 100~200 lines and yet they are as easily understandable as those with fewer lines.

In practice, they all say the same: the smaller the number of lines per method, the better, but it is not worth doing it at a level where a part of the code cannot be diminished.

Most of my methods are between 30~80 lines, but a few come close to 200 or more and I do not object if part of this larger method can not be reused in other methods.

Update (after adding code to the question)

I’ve removed blanks and documentation, and your code has given you 54 lines of operations that make sense together. Definitely in your example I see no strong reasons to divide into more methods if this will not allow the divided codes to be reused in other areas.

Remember when authors suggest a maximum line limit, they disregard blank lines and comments, and yet the lines are separated by logical lines. If the guy nests several ifs in one line that would count as several lines.

  • Certainly an enlightening answer. I tried to reduce the method to the logic of information extraction and even simple tasks, such as removing accents, I made sure to separate in another method. Which brings me to another question... that I think falls outside the scope of this one, so I’ll ask you another time.

4

I have not read the method, but class methods, in the context of object-oriented programming, should always offer a single functionality. Thus, its application becomes more uncoupled, the methods become more independent and the components are more likely to be reused.

So I don’t think there is an "ideal limit" for class sizes and methods. The ideal is that your development methodology follows the "first five principles" -also known as SOLID- which are the five basic principles for the design of an object-oriented application, namely:

  • S - Single Responsibility principle - a class should have only a single Responsibility
  • O - Open/closed principle - "software entities ... should be open for Extension, but closed for modification".
  • L - Liskov substitution principle - "Objects in a program should be replaceable with instances of their subtypes without Altering the correctness of that program". See also design by Contract.
  • I - Interface segregation principle - "Many client-specific interfaces are Better than one general-purpose interface."
  • D - Dependency inversion principle - one should "Depend upon Abstractions. Do not Depend upon concretions."

Source: http://en.wikipedia.org/wiki/Solid_(Object-oriented_design)

  • It was a good thing to mention SOLID, summarizing some concepts that may go unnoticed.

  • @rodrigorigotti very good your answer! I focused on LLOC, however the staff should consider yours as well

3

It has at least two strong reasons for breaking one method into other, smaller methods: reusability and readability. In your case, you mention that you will not reuse this code elsewhere. This argument can also change as you break into smaller blocks, reusing certain algorithms.

Especially in your case, the readability could be greatly improved. Within each for practically you have a comment saying what it does. Comments are too fragile for that. If you create an accessory method for each for, for example, you would delete the comments, making the code self-descriptive.

For example, your main method could be:

def parse_cardapios(self):

    nomes_ref, tabelas_card = carrega_refeicoes()

    return percorre_refeicoes(nomes_ref, tabelas_card)

Much more readable, in my opinion.

2

A general rule is that if a refactoring can make the code simpler, less complex, easier to understand and maintain, it should be done. The ideal is to create good abstractions, cohesive, with low coupling. Within this context it is necessary to evaluate whether a method should be refactored or not. Sometimes the solution is not trivial. Methods that have long stretches involving conditionals, for example, require a paradigm shift to refactoring (e.g., substitution for strategies that use polymorphism, transferring the responsibility of blocks to individual objects) There is also the possibility to analyze the methods due to their cyclomatic complexity. Tools like PMD, Checkstyle, Sonar, do this and are useful for discovering methods that should be reviewed.

(This answer was somewhat subjective due to the question also being. Send snippets of your code identifying your questions more objectively so we can discuss more practical solutions)

  • "A general rule is that if a refactoring can make the code simpler, less complex, easier to understand and maintain, it must be done. (...) Sometimes the solution isn’t trivial." I’ll print it on a label and paste it on my monitor.

2

This question is really very relative. It depends a lot on the opinion of each one.
In the code you posted above I would separate by steps, you yourself already separated, using comments, ie would make a function to go through the meals, one to go through the days, etc.
It’s really hard to say when a method is too big, even these definitions of number of lines may not be so correct, I think only experience will help you judge it correctly.

  • Good answer. I also do this, if the block needs explanation it does something "apart" if it does something apart it can be removed for a method.

  • @Selma Really complicated :/ But tell me, you would separate creating new methods that will only be used in processing this method?

  • Yes, in this case the goal is to make the code more readable and not necessarily reusable. This is one of the goals of breaking code into smaller pieces, it facilitates future maintenance.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.