While we made Drupal 7 easier to use and more feature-rich for site builders, we also added complexity for the core developers. We shouldn't be surprised though. As a software application evolves, its complexity increases unless work is done to maintain or reduce it.

When that happens, it is can be frustrating as software complexity is an obstacle to introducing change, can be a source of bugs and makes it harder for new contributors to get involved. The general sentiment among the core developers is that steps must be taken to reduce Drupal's complexity. I wholeheartedly agree.

Many people in the community put forward a lot of good suggestions to reduce the complexity of Drupal core; from removing unnecessary functionality, to decoupling systems, to improving our APIs and abstractions. All things we should consider doing. In fact, we have already removed some unnecessary modules and features from the Drupal 8 development branch. Last but not least, I'm looking to appoint a Drupal 8 co-maintainer that has the technical skillset to help manage Drupal core's complexity.

I also had a thought I wanted to run by you. It would be good if we could measure the evolution of Drupal's complexity over time. That would allow us to say things like: "Drupal 7 is 30% more complex compared to Drupal 6", "This Drupal site build has a complexity score of 420", or "This patch reduces the complexity by 12 points".

I'd like to see if we can come up with a "Drupal complexity score". It would obviously be important to combine different metrics such as (1) number of calls per function, (2) number of inter-module calls per function, (3) mean function size, (4) number of input arguments for API functions, (5) number of comments per function, (6) number of references to global variables, (7) number of different code paths, etc.

A "Drupal complexity score" is not the panacea, and neither will we ever have a perfect scoring system. However, I still believe that even a basic "Drupal complexity score" integrated in the patch review workflow (including our testbots and DrEditor) would be a big win. It is hard to manage what you can't measure. At a minimum, it would put reducing complexity at the front and center of every reviewer and maintainer.

Thoughts?

Comments

Eaton (not verified):

I think this is a good start towards measuring things -- one of the challenges, though, is the "mystery meat' quality of one of our heavily used APIs -- the FAPI/RenderAPI system. Our habit of using keys in arrays to store what amounts to flow-control information puts automated tools at a disadvantage.

I think it's definitely worth pursuing, but we should also beware of the kind of "invisible complexity" that the tools are blind to...

cweagans (not verified):

I think we'd spend a lot of time coming up with a useful metric that could otherwise be spent on getting patches ready to go. Seems like a lot of unneeded meta-work.

Also, I really think that Drupal 8 does not need a comaintainer. I think Drupal 8 needs many comaintainers. Basically, make everybody in maintainers.txt a core committer with the stipulation that they only commit to their designated "area". To cover the gray area, add one more committer that can deal with the patches that don't really relate to just one subsystem - things like the configuration management initiative and wscci patches (when they come to fruition).

cpliakas (not verified):

I definitely understand the sentiment of having more core committers, however this approach is like riding in a car that is speeding towards a brick wall with no brakes. Adding more people to the mix is like saying "let's speed up and see if we can crash through" rather than taking your foot off the gas and re-evaluating things.

More people committing to core will mean increasingly de-normalized APIs and a lot more code, which are the exact problems we are trying to solve here. Investing in testing tools has also proved very useful in the past. Look at the Coder module and Simpletest as examples. Although both are imperfect solutions, they facilitate more people working together more efficiently. Code complexity is difficult to gauge, but if it at least raises awareness of introducing unmaintainable code then it is time well spent.

beejeebus (not verified):

i think a big part of the gain of trying to come up with useful complexity metrics, apart from the metrics themselves, is the process.

working out what to measure, what parts are not measurable and why, etc, should help us avoid complexity as we go. so +1 from me.

Lukas Fischer (not verified):

Very interesting approach. Just some thoughts from my side:

We might should also think about what complexity actually mean. Is it really the amount of arguments? Number of calls per function? I think the complexity of code is how hard it is to understand what the code is for. This "complexity" is very personal.

There are some metric based approaches to measure e.g. https://en.wikipedia.org/wiki/Cyclomatic_complexity. I have no experience with this.

The more interesting question from my point of view is:
How do we overcome the personal complexity?
How can we improve Drupal to solve the "personal complexity problem"?

The answer is:
- Does the function follow a software design pattern?
- Is there documentation?
- Is there good documentation?
- Is there up to date documentation?
- Are there examples that help me to understand the function?
- Is there software design decition documentation available? E.g. "Why did we create this function?" instead of "How does it work?"

E.g. Readme.txt, comments on api.drupal.org, code snips are usually the first to start when you use "unknown code". If it's not available, the source feels veeery complex. We might need to think about extended documentation in code, e.g. a per function, per module, per subsystem context documentation. If it's available, we reduce complexity. If it's not available we raise complexity.

Anonymous (not verified):

I'd offer an additional metric to be considered. That is, the length and complexity needed to document the function/module in question.

cpliakas (not verified):

Just be aware that cyclomatic complexity doesn't understand the forms API as I think eaton eluded to above. Any form generating function yields scores that are off the charts, although the code is very maintainable and easy to understand. Just though I would throw it out there, and this may just be en exercise of deciding which pieces of code don't need to be analyzed.

Anon (not verified):

+1. I think we don't need a formal metric but to rise consciousness of this issue (something that has/is already happening).

Matt Farina (not verified):

I completely agree with Larry. We should use common measurements for complexity.

The unfortunate problem we'll run into is we cannot get an accurate measure of complexity from existing tools. They are designed for more traditional patterns. Our hook based system has served us well but isn't a typical pattern. We need a tool that can evaluate our hooks and alter calls.

This is one of the few cases where I would suggest we roll our own tool because of our fairly unique design patterns.

Larry Garfield (not verified):

For some things, yes. I doubt there is any existing tool or measuring system that can tell us that FAPI D7 is 14% more complex than FAPI D6 (or whatever it would be); that analysis we'd have to do ourselves.

Other existing metrics are still perfectly applicable as is. "how many separate files do you have to touch" is a valid metric, if used properly. "How deep is the call stack" is another that an existing tool can handle. "How many branches does your code have" is similarly a perfectly valid existing metric, and if existing tools can tell us that, great! If we need to extend them a bit to account for some of Drupal's peculiarities, that's fine. We're still building on an established base. (And I do mean extend, not "start using, decide we don't like, then gut".)

Berend de Boer (not verified):

Larry, the problem with these metrics is that they are based on things you can measure, but do tell you little about complexity.

In Drupal we are probably more interested in number of APIs, and calling depths (how many layers); I'm guessing that is a sufficient proxy.

Grace (not verified):

Real artists ship, not play with yet another statistics. Do we have too much time or too many developers? And yes, you can't measure taste. It is not about numbers. Let's get it done. Ship.

catch (not verified):

As well as balancing ease of use for end users vs. complexity in core, we're also balancing ease of use of APIs for contrib and custom code authors vs. complexity in core. A lot of Drupal 7 bugs have been caused by trying to automate certain things on behalf of contrib modules then running into trouble when they don't do what we expect.

The metric of lines of code changed in the past that Crell pointed to looks like a useful one to have - however that can only apply to looking over code retrospectively. Refactoring code to make it less interdependent will mean more changes to that code during that process so we'd need a way to differentiate a bit (adding features vs. refactoring vs. bug fixing). Either way that is going to be useful information to have.

While it's not about complexity as such (many of our tests are hard to maintain too), getting https://www.drupal.org/project/code_coverage actually used as part of the development process wouldn't hurt, when we initially added core tests a couple of people worked on automating and publishing the results of this, but that tailed off (although the module has had some recent activity which is good).

As well as automating code review more, we could also improve our overall knowledge of what's happening in the issue queue. Many issues end up with no commit, or only very small changes, precisely because the code they're touching is complex and fragile. I opened https://www.drupal.org/node/1260490 to discuss automating some issue queue metrics - we could then add to that with reviewing historical issues against components (and/or issue queue tags).

For example I did a very quick calculation yesterday and in the past half year, there are on average 2.2 major bugs and 0.5 critical bugs posted each week against core - that's only the ones that survive at that status and don't get triaged elsewhere.

Owen Barton (not verified):

I agree that complexity is a really critical area - the early discussion (back in Drupalcon Vancouver etc) on managing complexity and aiming for simple and elegant solutions was one of the things that drew me into Drupal. We have kept certain aspects of it (e.g. consistent coding standards) very strong in our culture, but I feel like overall we have not emphasized it as much as we could recently. Of course, given the kind of functionality we are trying to provide (verses Vancouver) the big gains in simplicity needs to come primarily from well thought out architectural decisions - however, there is plenty of low hanging fruit that could use some refactoring, and I suspect working on those will uncover bigger architectural changes too.

In terms of metrics - Sonar (http://www.sonarsource.org/) is a pretty nice FOSS tool that does just what you describe. I don't think it's PHP support is as good as Java etc, but it looks like a good start (it uses pdepend.org for complexity analysis, which includes Cyclomatic_complexity and several other metrics). The nice thing about it is that makes the reports pretty easy to use, with nice bubble charts showing you how different areas of code progress over time, and against each other - basically turning it into a bit of a game (I can see maintainers one-upping each other already!). Even if we don't use this specific tool, it's worth playing with to get a feel for how this can support decisions and drive overall energy.

Of course, we do need to keep an eye on the complexity that automated tools can't find - our many arrays are an excellent example, the upgrade process is probably another. Generally if it is not easy to exhaustively document or unit-test something, or the number of people who understand something can be counted on one hand - that suggests we might want to take a closer look :)

nicl (not verified):

I like the idea of trying to limit the complexity of core.

One thing not mentioned yet (perhaps because most people don't consider it such an issue?) is simply the size of core - perhaps measured in the number of lines of code or file size (excluding images etc.).

Even if code is simple, Drupal is becomming sufficiently big now that familiarising oneself with all of core is just plain difficult for smaller developers. Obviously complexity is the more important metric, but I think size matters too. (Hence, in part, the attempts to jettison various modules for Drupal 8).

FGM (not verified):

Beyond cyclomatic complexity, which Crell mentioned, I think the most widely accepted measure of complexity is the use of function points, with its derivatives like (MKII | Nesma | OO | weighted) Function Points.

Are we looking towards establishing (some portion of) the Drupal community as a CMMI-certified organization ? :-)

heyrocker (not verified):

Let's also remember that not all complexity is bad. Some in the community have argued that, for instance, the new database layer introduced too much complexity and raised the bar for programmers who want to start with Drupal. It is true that it introduced a new syntax and conceptual framework for people who are used to writing SQL, and certainly tracing the code execution path is more involved than it used to be. On the other hand it also improved security, improves testability, reduces bugs, allowed for a wider variety of supported databases, and makes the code far more readable (in my opinion anyways.) Sometimes 'complexity' comes with advantages and they shouldn't be ignored.

I do like some of the measures above, especially the ones that target how tightly coupled a piece of code is with other pieces of code, a huge problem we currently face.

parasolx (not verified):

For me, measuring complexity not really give opportunities to developer/developer since they have their own standard code combined with standard Drupal API. measuring this metric would lead more additional code which totally only give benefits to one side, not the whole people use Drupal.

My suggestion if it is possible to measure the performance based on factor contribute. Such as, measure mean for generating one/whole page. And come out with analysis regarding number of contributed modules. And take variables from server such as server, PHP and database.

Through this analysis, either developer or site admin can determine what they can do to improve performance. Maybe change hosting, reduce number of query/modules.

The whole systemic checking can be name like, Drupal System Health or Drupal Performance Status. It is more reasonable rather than we measure the complexity alone.

Larry Garfield (not verified):

"Sometimes 'complexity' comes with advantages and they shouldn't be ignored." --heyrocker

"Generally if it is not easy to exhaustively document or unit-test something, or the number of people who understand something can be counted on one hand - that suggests we might want to take a closer look :)" --Owen Barton

I just wanted to call out those lines from the comments above. They bear repeating, many times over.

effulgentsia (not verified):

I like this presentation from the author of Clojure. The whole hour is thoroughly enjoyable, but one part relevant to this discussion is what he states in the first few minutes that there's a difference between complex (opposite of simple, an objective measure, at least in principle), and hard (opposite of easy, a subjective measure). I think we should strive towards both greater simplicity and ease, but recognize that they are two different things.