Big job or small job?

Asked

Viewed 1,673 times

27

Why is creating a large function or method with many lines of code considered a "bad practice"? What are the drawbacks of this?

What do I get in dividing into smaller functions or methods?

What factors should I take into account to determine when to subdivide a function?

  • 1

    Related question: http://answall.com/questions/31485/o-tamanho-de-uma-fun%C3%A7%C3%a3o-afeta-a-performance-e-consumo-de-mem%C3%b3ria

  • Bad practice is not in the number of lines but in the amount of responsibilities; each function should have a single responsibility. The number of lines is only a warning that the method may have too many responsibilities, which could be moved to other functions. One of the main reasons it’s bad practice is that our brain has a limit of things it can focus on at once. When we look at a block of code we are usually interested in only one of its functions, so the other functions that are there take our focus, making the work difficult.

6 answers

17

Since I started developing software (a long time ago) I see people talking about ways to evaluate when a function or method is too big. I bought all these shapes up to one day. This is normal, this is what inexperienced and yet naive developers do. But one day I discovered that there is no magic solution and especially that there is no magic number.

As far as I know there are no reliable studies that indicate the ideal size of a function. Any attempt to find something like this seems doomed to failure. We can observe what other people think of this, take advantage of their experiences and try to facilitate a path for each individual case to be more easily determined.

In general presented problems generic answers do not help much and specific answers just try to sell as general something that does not apply to the specific case. And many well-known books do this. They need to satisfy the readers' desire for some magic formula.

In the '80s I heard the programmer was going to finish. There would be formulas that determined everything as a program could behave and anyone would put into the computer what they needed and the program would be finished. There are doctoral theses on this! Today everyone knows how ridiculous this is. The programmer will always be needed precisely to determine case by case what is good or not, what solves or not a problem. You can’t generalize. AI will help you encode better, but not model.

When someone tries to explain how to make the right decision you can say what you need to do to facilitate the process but nothing that can be advised will make someone make the right decision.

Some definitions and tips

Let’s see some attempts in English that occurred in Software Engineering.SE to reach a "definitive conclusion".

What should be the Maximum length of a Function? The question was closed because you can’t answer this without resorting to opinions and we even found a joke about it in comment from Thomasx. Does anyone doubt it’s a joke?

When someone posts a number of lines a function should have the person must be making fun.

But there are answers there that help us understand how to know the ideal size of each function individually. There is even the presentation of study that shows the difficulties of people to understand problems and explanations. Much information, dispersed. Little information, no help, or worse, help the wrong way.

Some say depends on of the language being used. Of course it depends. It also depends on the technology used, the problem being solved, the requirements, the team that is working on the problem and probably other things.

Others say the function should be as little as possible. And it seems to be good advice. The problem is knowing what is as small as possible. Often this is interpreted as the least amount of lines possible. People tend to have an incredible tendency to take certain recommendations literally. And worse, to think they’re worth anything. From this comes such "good practices" and suddenly you can no longer do otherwise, more correct, because it hurts the "good practice".

There is even the statement that you should decrease what the function does until it is no longer possible to decrease. A teacher has already told me to do this. I gave him a code on Pascal worse than Assembly. Exaggerating a little each function did just what an Assembly instruction is able to do.

Other good advice: If you don’t know how to give a good name to the function, it probably does more than it should. You’re on the right track, a function cannot do more than it should. But it still doesn’t make it clear what this is.

Someone asks what the ideal size of a building is. Why do not all buildings of the same size?

Here comes the problem of cyclomatic complexity. It’s really relevant. It’s something that really gets in the way of understanding codes. One of the reasons we divide the functions into smaller parts is precisely to facilitate the understanding of code. This tends to facilitate maintenance as well. There is no fixed number of what is a problem. There are studies indicating that this is important and that there is a relationship between it and the amount of defects in software. And large functions tend to have more cyclomatic complexity.

There are answers making it clear that a function should do only one thing. There seems to be no discussion about this. It is possible but unlikely that a function with sole responsibility has many tens or hundreds of lines. All experienced developers know that most functions will have very few lines. But it is also clear that divide too much, just because someone said that the functions should be small, do not bring good results. Over-segmenting can make code harder to track. You get out of the code spaghetti for the lasagna code.

Hence the question arises whether we should create a function for something that will not be used again elsewhere. By the principle of the sole responsibility of the function, yes, we must. But we must not exaggerate. You have to evaluate the case so that you don’t make it harder to understand what you’re doing, create a more complicated situation to deal with, make performance unacceptable or make it harder to maintain in the future.

So if it is something repeated, should it be put into function? Not necessarily. It is likely but, again, it depends on the case. Remembering that DRY does not necessarily mean deleting code repetition. And even if it is a case that is ideal by DRY, there are other factors that need to be taken into account. Anyway repetition of code can be one cause a function to be large. Repetition is a problem in itself. I’ve seen cases that repetition makes the code smaller. That’s why compilers tend to optimize code better than humans. They evaluate the specific case objectively and know for sure what will be more or less fast. They make no assumptions.

There are good observations that size can affect performance for more or less. This was even the original concern of the questioner, but it was not clear to anyone. Calling a function has cost, calling more than it should (because it has more functions than it should, by dividing more than it should), affects performance negatively, including forcing unnecessary memory manipulation. Large functions can affect the cache or can complicate a Jitter and the organization of memory. But large functions usually affect negatively more by a side effect. Since it can be more complex than it should be, it can be easier to make mistakes that affect performance. The opposite is also true. Large classes or modules, full of methods or chopped functions can cause the same effect. I speak of this with more property in more specific question on resource consumption.

I like the phrase that a function must have the size it needs to have. Yes, that doesn’t mean anything, but it’s the only undisputed truth.

Some people talk about functions that deal with huge switchs. For those who limit the number of lines they probably have an exception to this. How many other exceptions are made? What if all this was unnecessary? Does the exception let it go anyway? It seems that looking at the number of lines is at least looking at the wrong problem.

And note that each one gives his own interpretation of what he has read somewhere. And it can’t be very different. Each has a unique past path, works in unique situations.

Then I found a question closed as well. It is getting long, I will not extend. It is said a lot that lines is not a good parameter and the function should do just one thing. Some say large functions are difficult to test. This is true but complicate a design just to facilitate the test also does not seem to me one of the best ideas (and there are holy wars on this).

And finally in that question shows how setting boundaries makes no sense.

If you look at the books Clean Code and Code Complete will see a huge discrepancy between the recommendation of the ideal number of lines. What shows that these books should have their credibility questioned (not that the whole book is bad) since they show that these numbers are just meaningless opinions. But if you want to hear some say to you a number, in which believe? 20 of Clean Code or 200 of Code Complete? Or 2 or 12 that someone else might say?

In that reply i talk about some things that are most important to maintain a "clean function".

Completion

If you want to know the number of lines a function should have to draw the programmer’s attention to determine if it is large, choose the number one. A line may already be doing more than it should. And that’s the important thing. Unless you have a reason to do the opposite, choose the size that best organizes the purpose. Don’t do any bigger or smaller than this.

14

I produce in OOP, and learned to divide into methods. So they can be reusable whenever necessary... **DRY**

In my opinion, large functions generate dirty code. The chance that you have very similar functions is great.


Procedural form for example:
1) function criarLink()

A function to create a link, will check the current protocol HTTP | HTTPS, Domain, among other pertinent things.

2) function criarImagem()

A function to create an image, will repeat some similar checks in the function criarLink.


To decrease redundancy the simplest would be to split into functions.
In short, divide the responsibilities for better use by the system as a whole.

Unlocking the simple example above, instead of two redundant functions filled with the same code, it would be enough to have the functions criarImagem and criarLink for the composition of HTML elements (for example) and functions such as isSecure, getSubdomain, etc - that would always be accessible without dependence.

Making use of the simple examples, isSecure and getSubdomain, would be 2 methods( in OOP ) containing elements for Response, Request, View, Controller, .... In several steps you need to have access to the type of request, exit protocol among others.

A correctly oriented system is easy to maintain, easy to adapt pattern's, has reuse of codes making production more agile...

  • 3

    I agree, but I think you should be careful not to over-divide, like, a new one- or two-line function. Then it gets more laborious for other people to analyze.

  • No doubt. For this not to happen it will depend on the common sense of DEV.

  • 6

    Basically you are explaining DRY that it is neither cause nor consequence of the size of the methods. @Earendul there is no problem in how many methods are with 1 line, if what he needs to do has only one line. What "cannot" is the method doing more than it should, having multiple responsibilities. "It cannot" separate a single task into several methods. It is rare to happen but if really a method does a single task and needs dozens of lines, the size of it is good, additional divisions will be artificial and unnecessary.

  • 1

    I quoted DRY precisely by the exaggerated repetition of codes... I think OOP or procedural, not repeating code is fundamental.

  • 1

    Code size and repetition are two separate things. Repetition can even lead to the method being large but by itself this is not the problem, the problem is repetition. If even if it has repetitions (which is not good) the method has only what it needs to do, it is not great. You’re talking about the side effect of repetition. I love DRY, you’re right in what you’re talking about, but it’s not the central point of this question. It doesn’t say what the big method problem is. It doesn’t say when the method is bigger than it should.

  • 1

    The method is greater than it should - in my view - when it does more than its responsibility. I tried to address this in the question by raising 2 fictional examples that have common verifications that should be abstracted. Create a link for example, has no responsibility to identify the protocol.

Show 1 more comment

11

I believe that function size is not the criterion, but rather an indicator.

For example if we have a function ImprimeRelatorio for example that inside it rendenize, adjust the margins, read the user settings for report details, call the print configuration screen and then print makes your name says, but there is no place for rules on user preferences.

if the function does not implement all these features, but perform the activation of other functions that do this, it is no longer so bad, as for example in:

function ImprimeRelatorio ( relatorio ) {
  relatorio.render( relatorio.getMargins() );
  relatorio.applyUserPrefs( relatorio.getUserPrefs() );
  var cfgImpressao = getCfgImpressao();
  relatorio.print( cfgImpressao );  
}

But generally the rule falls for common sense and notions of Cohesion and Coupling, Dependency Injection and more.

  • 1

    The answer is on the right track but it lacks a little grounding of why this is good.

  • 1

    @Thanks for the tip! I’ll try to finish, but I’m on a big demand here today to deliver to the other teams. At lunch time I can complement explaining better with a better theoretical background.

11

What I try to do

I try to follow the concept of making classes and functions with only a responsibility.

Programming with this principle, naturally you will generate smaller functions and classes.

The biggest benefit I realized by programming this way was in the maintenance and design of the source code.

For example, you have two functions, one that searches the data and the other that renders the data.

If you change the search function, you may have entered a bug in this function, the code referring to the rendering remains intact in another function.

If you program rendering and searching in the same function, the logic of these codes will be mixed, any change means a risk of breaking both functionalities.

In short:

If you program only one functionality per function, when you to maintain that functionality you will only break that functionality, the others remain intact.

What I actually do

I write programs commercially, most of the time I program seeking to deliver on time.

Depending on the deadlines my codes are closer or more distant from the model I originally conceived.

Exceptions

There are reasons to write monstrous functions, one I know is with regard to performance in batch processing.

It’s faster for the computer to access a piece of code that’s right there in memory than to jump to a very far stretch and do it over and over again.

Depending on the need of your project, one day you will need to optimize the project by making "ugly" performative code.

Another case that I don’t particularly care about is in simple codes, small scripts, etc. Sometimes the project is so simple that you will only complicate doing these things.

Most important

Take all the opinions, books you’ve read on the topic and see them as tools, using as needed by each project.

Using an absolute truth from one source or another will only get in the way.

Update 2021

I recommend reading the @Maniero response.

She says everything I said and more in a much deeper way and with sources.

8

The important thing is the method do only one thing and only one, it decreases the complexity and facilitates tests and maintenance.

Example:

public function salvar(){
    //salva seu modelo

    //lógica não pertencente ao método 
    header('Location: /');
}

Now think if you want to save more do not want to perform a redirect ? It gets complicated, will happen at least one Gambi. Also think about maintenance, you call the method and an unexpected redirect happens. This is all because the method is not doing what is proposed.

6

I believe this varies greatly according to your need....

Often we can break large functions using recursiveness and/or using little pieces of smaller functions that make up together what the big one would do (roughly, I much prefer to break into several functions), this can decrease and/or increase the amount of steps that an algorithm will go through, I agree with much of the above comments, that this goes more from the logic of the programmer than from the language used... In general, this is a more complex problem of algorithms.

We should often analyze the environment we want to work, I’ll give a classic example of low-level language:

When we are working directly on the programming of micro controllers (an environment that we work on most of the time in binary or hexadecimal), we cannot use the ? /' (division) operator because it is a function that is not native to the language. For this we have to include a library and to use, this generates several accesses in the registers of the processor, and it implies that its algorithm is much slower, a GAMBIARRA that sure in this environment is to use a kind of function "push the bit to the right", exemplifying:

The binary number 11101 in decimal is equal to 29, if we push a time to the right, we will have 011102 which is 14, or 29/2, if we push again will give 00111 is equal to 7 or 29/4.

With the library that has the division method, we would have only 1 line of code, and with this algorithm of "PUSH TO THE RIGHT"we will have around 4 to 5 lines. Actually this is not a Gambiarra per se, but a trick that we use, and the processor thanks, because this way it uses much less processing resources.

In high-level languages we have other kinds of abstractions that we must take care of. By Exp: a while ago had to make an ordered list within another list that also had to be ordered, had several schools, and had several students within each school.

I broke my head for a few days to develop an algorithm that would do this, it gave about 20 lines, then I showed my code to a colleague, and he gave me a suggestion that in the end turned out to be much easier and better, sort the whole list by school, and compare if the last result obtained was the same as the current one, if yes it continues listing, if it does not create a new line for school and list, with this I obtained a saving of 10 - 8 lines. I didn’t even have to break the code into little pieces

For these and others it depends too much on your need. There is no policy of good practice as to the number of lines of a function, but rather of performance, if you see that the algorithm is getting too slow, to, think, and ultimately seek help and alternatives (y)

Browser other questions tagged

You are not signed in. Login or sign up in order to post.