This is it. After months of tweeting memes about mvn clean install
I decided to get some numbers to see how performance would be affected when switching from one command to the other. But first a bit of history. Like many developers when faced with building a Maven project the first reaction is to invoke mvn clean install
, after all you find that instruction in almost every README or BUILD file, so why question it? It does not matter if the project is single or contains multiple modules, the instruction is the same. As a matter of fact when the build contains multiple projects invoking install
is a hard requirement, isn't it? Otherwise sibling projects won't be able to find inter-module dependencies. Or so I thought.
Back at JCrete 2018 I had the opportunity to talk to Robert Scholte (@rfscholte) about Maven. I was facing an issue with building a subset of the Reactor and he showed me the -am -pl
command flags. Part of what transpired during our sessions was captured at this post. He also pointed out that I could use verify
instead of clean install
or just install
. I was baffled at first because I had no idea that verify
existed nor what its purpose is. My first reaction to switching was along the lines of "hold on, the other projects won't compile as they need to resolve dependencies from their siblings". With a smirk on his face Robert patiently proceeded to explain the concepts behind Maven lifecycles, goals, plugin bindings, and the execution within the Reactor. Everyone in the session had a great time and the insights shared have been very valuable. If I can condense what was said back then it would be something like this:
- Maven plugins (Mojos) expose behavior via goals. A goal is thus the entry point to perform specific behavior.
- Maven runs a set of lifecycle phases in sequence.
- Plugins have the choice to bind their goals to specific phases, so that when a phase becomes active the goals will be executed.
- The lifecycle sequence is predetermined (if you want to be nitpicky it can be changed by means of extensions but usually stays the same).
- In the default (or most common) sequence we've got the following phases: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy.
Now note that install
is the next phase after verify
. This means every time you invoke install
you're also invoking verify
, remember that fact. Forward to November 2019, Robert presented about Maven at Devoxx BE (video), the gist of the talk is adapting to the tool's new behavior. Of course he mentioned clean install
vs. verify
. For historical reasons (back in the Maven 2 days) we were forced to invoke install
all the time if a module in a multi-project build had a dependency on another module, because at the time, multi-project support in Maven was in the early stages. As a matter of fact it didn't began as part of core but as the maven-reactor-plugin. The lessons learned with this project were later added into core for Maven 3. The reactor now being part of core means that some of its features could be deeply linked with the rest of the plumbing, such as attaching the computed artifacts to the current session (the Reactor) so that other modules can fetch artifacts from it instead of fetching from repositories. And that particular piece of behavior is executed during the verify
phase.
Another thing that may not be obvious at first is the invocation order of goals inside a Reactor. When we invoke X number of goals each one of them will be invoked in that order per module, that is, when clean install
is invoked in a multi-project structure such as
. ├── pom.xml ├── project1 │ └── pom.xml ├── project2 │ └── pom.xml ├── project3 │ └── pom.xml └── project4 └── pom.xml
It would mean that the goals are invoked as
:clean :install :project1:clean :project1:install :project2:clean :project2:install :project3:clean :project3:install :project4:clean :project4:install
Instead of invoking a goal at a time per project, such as
:clean :project1:clean :project2:clean :project3:clean :project4:clean :install :project1:install :project2:install :project3:install :project4:install
It's because of this behavior that if we were to invoke install
^H^Hverify
on this particular build means that the artifacts produced by project1 become available to all other projects if they need them, without requesting said artifacts from a repository. The other aspect of avoiding clean install
is that Maven 3 added support for incremental builds which means invoking clean
defeats this feature. It's worth noting that even though Maven offers incremental build support it's the job of the plugins to enforce it. Thus if you encounter a problem when relying on incremental builds please let the Maven team or the plugin authors know that something is not working as it should.
Alright, time for some numbers. Decided to run this experiment on a set of projects that happen to be popular, run dozens (if not hundreds) of builds per day, made use of all kinds of plugins and setups, such as code quality checks, annotation processors, source code generation, etc. I settled with the following 6 (showing only Java code count):
Project | Modules | Files | Blank | Comment | Code |
---|---|---|---|---|---|
Guava | 6 | 1976 | 65074 | 139019 | 375825 |
Byte Buddy | 9 | 1128 | 33408 | 76365 | 159056 |
jOOQ | 9 | 1772 | 73330 | 206477 | 194156 |
Sentinel | 80 | 1064 | 12772 | 28026 | 58270 |
Helidon | 202 | 3040 | 52731 | 120873 | 223469 |
Quarkus | 737 | 6849 | 80264 | 46954 | 361403 |
Lines of code were counted using https://github.com/AlDanial/cloc upon checkout. All builds were run with Java 15 except for Guava and jOOQ for which Java 8 was used. The raw data can be found at this gist. The following commands were issued for all builds
mvn verify -DskipTests mvn clean mvn verify -DskipTests mvn verify -DskipTests mvn verify -DskipTests mvn clean mvn verify -DskipTests mvn clean verify -DskipTests mvn clean mvn install -DskipTests mvn install -DskipTests mvn install -DskipTests mvn clean mvn install -DskipTests mvn clean install -DskipTests
The first invocation of mvn verify
is for testing the project builds and also downloading all required dependencies and plugins to the local Maven repository, that way network access should not affect the build times. Then the repository is cleaned and the actual measurements take place. Decided to skip tests as I'm not interested in them for this experiment. This decision will be revisited later in this post. The measurements I've got can be seen in the following charts, where the blue lines are the invocations of verify
and the red lines are the invocations of install
. The theory says that the red lines should be a bit longer than the blue, albeit slightly. All numbers are reported in seconds.
Guava conforms to the expectations, where invocations #2 and #3 are faster than #1, close to 50% faster as the computed results (class files, processed resources, etc) are likely left untouched and can be reused as is. There is however work being done, perhaps code quality checks. The hit of always using clean between builds is not that big, as we can appreciate cleaning the repository takes about 4 seconds; which is time included in the last 2 measurements.
Byte Buddy appears to perform the same tasks regardless of previous results. This might signal that the build does not use the incremental build support provided by Maven, or by its very own nature Byte Buddy must perform the same tasks no matter what.
Now jOOQ is an interesting beast. Notice that invocations #2 and #3 are close to 85% faster. This build does not waste time invoking tasks that are not required.I was quite surprised by these numbers, Lukas might be unto something.
Sentinel falls back into the expected range of 50% to 40% time savings for invocations #2 and #3.
Helidon appears to be another build that performs the same tasks regardless of previously computed results.
Finally we've got Quarkus (whose current module count is above 745!) for which an additional profile (quickly
) had to be used to disable code quality checks, formatters and other bits, otherwise the build would have take north of 20 mins. Notice that Quarkus has close to 60% time savings for invocations #2 and #3.
Feel free to make your own measurements. These numbers are but a sample and should not be taken as hard facts, however they show trends. These numbers do not lend too much credibility to the advice that install
is slower than verify
, we can see that the differences are negligible for most cases though I would still recommend you to take measurements in your own projects, perhaps it does make a difference for you. What is clearer is the use of clean
and incremental builds. In an ideal setting you wouldn't have to invoke clean
at all, saving it for certain occasions where removing intermediate results is a must; there are times where testcases always require a clean slate for a particular set of directories or resources. And yes, the EAR plugin is currently broken and you must use clean
and install
all the time.
There are other valid uses of install
in a multi-project build. Perhaps there's a testcase that requires artifact resolution in order to work; publishing artifacts to Maven Local is the quick fix. Another option would be to setup the mrm-maven-plugin (Mock Repository Manager) which exposes the Reactor artifacts as if they were available from a Maven compatible repository, the testcases would had to be updated to consume said repository. The use of this plugin is not so common because it's more convenient to invoke clean install -DskipTests
than adding one more plugin to the build.
The other quite common use case is to invoke goals on a single module outside of the Reactor, because as we know, all goals will be invoked in all projects participating in the reactor. Say we want to run tests on project3 as shown before, we have two options
$ mvn -am -pl :project3 test
Which runs the test
phase on all projects that participate in the Reactor, that means root, project1, project2, and project3. But we really just wanted to check project3 alone however we need the POMs and JARs for all its dependencies. This poses a problem given that project3 depends on project2, which depends on project1, and all of them depend on root because it's set as their parent. The alternative is thus invoking install
at the root (with full Reactor or a subset)
$ mvn -am -pl :project3 install -DskipTests $ cd project3 $ mvn test
And continue to invoke test
or any additional goals on project3 until we're satisfied. It appears Maven was designed to be invoked from the root and not from a submodule.
So. Is preferring verify
over clean install
busted due to measured times? It certainly looks like that, but (there's always a but) it depends. Remember that install
is responsible for copying artifacts from the Reactor to the local repository. If you don't need those files in the repository because running verify
provides the same observable results (code works, test are green, etc) then don't invoke install
at all. You'll save space. If space is not an issue on your local development machine, you might need to check if it's an issue on your CI environments. A common occurrence when invoking install
and running tests on a single project is that the consumed artifacts are not the latest ones, we tend to miss a previous install
and now we go down hunting for a bug that it's not even real, it was just a binary incompatibility due to stale dependencies.
In summary, invoking verify
over install
gives you the same benefits (most of the times) with less drawbacks. There are uses cases when you must install
as nothing else will work. The constant use of clean
is also not a good idea however there are scenarios when you must apply it. You might be thinking how can I tell when it's safe and when it isn't? Given the trends shown here (and potentially the measurements made in your project) the net performance gains are not that many, it's best to keep using clean install
and fuggedaboudit.
Perhaps. And perhaps not. Personally I prefer going down the rabbit hole and finding out more about Maven's inner workings and discover ways to make my builds more efficient and performant. I like learning how to use the build tool beyond a small set of commands, I don't mind experimenting, but I do understand others may not be so eager to follow suite, or their circumstances may prevent them from doing so.
In closing, I'd recommend you to have a look at verify
, perhaps it fits your use cases better than install
.
All memes can be found at aalmiray/mvn-clean-install.
Keep on coding!
Hi,
Interesting article, good job :). One small detail I would improve is to put a legend on the charts – there are four sets of commands and it’s difficult to follow which is which.
Same. I can’t understand the charts at all. Text says: “look, 60% savings between #2 and #3”, I look at the chart and what I interpret as #2 and #3 look identical! ¯\_(ツ)_/¯
Interesting article all the same. Thanks.
Thanks for this article
If your goal is to build/verify faster, the option -T 1C would help a lot in big multi module projets !
Indeed, running the build in parallel mode can result in a speed boost. It could also uncover problems in testcases where shared values are not supposed to be shared (magic singletons for example, external resources, etc.)
mvn -pl :project3 test
will run tests for project3 module without having to cd.
No, it won’t, not in this case.
project3
depends onproject2
which depends onproject1
. Running the command you suggested (without a prior install) will fail. The blog post mentions this, you must usemvn -am -pl :project3 test
to effectively run all tests inproject3
and making sure all of its dependencies are ready.I’ve found an issue when running `mvn install -rf SubModule` vs `mvn verify -rf SubModule`. Our builds include a few submodules, some of the modules are generating classes (similar Lombok). So when the build fails and you are resuming it with the `-rf `, the `install` is able to find the previously generated classes and happily continue, while `verify` isn’t finding those classes.