A couple of weeks ago I posted a link to a pop quiz regarding dependency resolution in Apache Maven. 509 brave souls took the challenge; now is the time to share the results. The quiz is comprised of 14 questions split between 3 sections based on context:
- A single project
- A project with a parent POM
- A project with a BOM
Each question revolves around figuring out what would be the selected version of a particular dependency, in this case Guava, because as many developers know, whenever there's a classpath issue Guava is always to blame (What?!) To be fair, I could have picked any GAV (GroupId, ArtifactId, Version) coordinates for this quiz, Guava by itself has no bearing in the outcome. Each question has a single valid answer, one of them being "Build error" as Adam Savage so eloquently put it
It's worth nothing a few changes were introduced during the course of the challenge:
- Some questions declare dependencies on Guice and Truth, which in turn have Guava as a direct dependency. The questions mark it as transitive, which obviously is incorrect from the point of view of Guice and Truth, but correct from the point of view of the consumer project. Thank you Florent for the clarification!
- Both Guice and Truth have Guava as an immediate direct dependency, that is, there's only 1 hop to find them.
- All snippets were verified with Maven 3.6.3.
First let's have a look at some statistics and insights. There were actually more that 509 entries in the quiz but unfortunately there were some less than helpful entries (read: trolls) as the quiz could be answered anonymously. How can I tell this was the case? Easy, remember what I said about failure? Well that was misdirection, none of the questions produce a build error! These smarty pants thought than answering all questions with "Build Error" was funny but it got them singled out. Haw haw! Joke's on you! There were also 2 duplicate entries from people that appear to have supplied answers in two different devices.
Removing the odd answers results in the following score distribution
As we can appreciate there are two groups, those that did somewhat OK in the middle (6 to 8 answers) and those that were closer to all correct answers. The average and median are exactly 8 while the standard deviation is 3.08. Let's get to the specific questions, shall we? By the way, you can click on any image to enlarge it.
Part I - Simple Project
The first question sets the mood. Here we have two explicit definitions of the same dependency but different versions. As I understand the rules Maven selects the nearest version available thus my choice would be 27.0-jre. 35.5% of respondents thought that duplicate dependencies would cause a build error. As a matter of fact neither of these two are valid as the correct answer is 28.2-jre! However tools such as IDEs will correctly flag it with a warning, which is why I skipped taking screenshots of IDE windows 😉 You do get a warning on the command line if you attempt to run a build as shown in the next screenshot
This goes to show that one should pay attention to the warning messages found in the output, they are there for a reason. Also, both dependencies have the same "near" factor however the latest is the one that wins in this case. Remember this fact for the next questions.
Finally, why would Maven allow duplicate dependencies (same groupId and artifactId but different version) to be defined, right? I suspect there's a historical reason for this behavior to still be around.
This snippet shows an explicit dependency and a transitive dependency on Guava (from the POV of the base project), the key answering this question lies in that distinction. As we saw on question #1 Maven will select the nearest matching version, which would make it 28.2-jre (88.2% of respondents guessed correctly) as we can see in the following output
This question was to make sure the respondents were paying attention to explicit vs. transitive dependency resolution, glad to report that most did.
Now we introduce the <dependencyManagement> block into the mix. This block works as a lookup table whenever a dependency needs to be resolved. In this case Maven will go through the lookup table checking if a matching dependency (groupId and artifactId) is declared and if so return the matching version. Thus the correct answer is 27.0-jre as witnessed by the following output
The fact that only 59.7% of respondents answered correctly makes me think that usage of the <dependencyManagement> block is not quite clear. Notice that you can use this block in any POM, it does not have to be found on a parent POM nor a BOM to be effective, which is a common misconception.
We mix it up again by adding an explicit dependency. We know from Question #2 that nearest (in this case the explicit dependency) wins over transitive, so the question now is does explicit win over a version found in the <dependencyManagement> block? Well yes, it does, as explicit dependencies always win and thus the answer is 28.2-jre. 67.3% of respondents thought the same
We now introduce 2 transitive dependencies on Guava with different versions, both of them at the same depth in the dependency tree. We now know from Question #1 that the last dependency won; well actually that's true for explicit dependencies. In the case of transitive dependencies though the first one (and closest one) wins, which means the correct answer is 25.1-android! Don't believe me? Check this output
This turned out to be a difficult question for 55% of respondents as it seems that the common understanding is that Maven will pick the latest version as if it were to follow Semantic versioning. Well I hate to burst your bubble but it does not, as Robert aptly put it
Maven never looks to the version, but always to the location in the tree. With the huge dependency trees nowadays I think we should reconsider this in a future major release of Maven. There are enforcer rules to protect you.
— Robert Scholte (@rfscholte) March 26, 2020
Can something be done about this before a major release of Maven is posted? Yes, you can enforce versioning rules using the maven-enforcer-plugin.
Alright, this snippet applies the <dependencyManagement> block. As we know from Question #3 this means transitives dependencies will be checked against the lookup table. We have two transitive dependencies on Guava, both of them will find a matching version, making 28.2-jre the correct answer.
OK, that's enough for a warm up. Up to now the base rules for dependency resolution should be clear, right? We've been looking at a single project, let's amp it up by adding a parent POM.
Part II - Project with Parent Pom
We're back to having explicit and transitive dependencies, the change being that the explicit is defined in the parent POM. Would this cause a different result from Question #2? No, it does not, again explicit dependency wins over transitive because it's nearest, making 28.2-jre the correct answer once again.
This was another check to figure out the parent/child relationship, yet it's surprising that 28% thought the answer was not 28.2-jre.
We bring back the <dependencyManagement> block for another round, will it make a difference with the previous question? No, it does not, reason being is that we still have an explicit dependency thus the lookup table has no impact at all, making 28.2-jre the correct answer once more. Here's the output that verifies this fact
Questions #7 and #8 have the same answer yet the difference is the use of <dependencyManagement> block which raised the number of incorrect answers from 28% to 45.5%. Once more I think the rules governing this block are not well understood.
This snippet is similar to Question #3 where we have a transitive dependency and the <dependencyManagement> block however the change is that said block is found in the parent. As it turns out a child POM inherits declarations from its parents (single parent up the chain all the way to the top Maven Super POM), given that there's no lookup table in the child then the one found in the parent is used instead, resulting in 26.0-jre as the chosen version.
What happens if the child projects defines a <dependencyManagement> block? Will it affect settings inherited from the parent? Well yes, remember that settings are resolved nearest first, in this case it means that values in the child lookup table overwrite values that may exist in the parent table. In other words, when Maven creates a combined lookup table, values from the parent are added first then the child's, resulting in overridden values. The correct answer is 28.2-jre.
We're almost done, with one more section to go. The context now is the inclusion of a BOM file (explained here) that declares a dependency on Guava. Let's have a look.
Part III - Project with a BOM
This is the same scenario as Question #6 except that the <dependencyManagement> block is provided by a BOM file. There are 2 transitive dependencies on Guava and a lookup table provided by the BOM, the answer should be ... 28.2-jre. Close to 45% of respondents thought that this snippet produced a different result, that signals that the rules of the <dependencyManagement> block and/or BOM files are not quite clear.
Building up from the previous example now the BOM not only declares a lookup table but also defines an explicit dependency on Guava. We know the rules for explicit dependencies by now so it should come to no surprise that 28.2-jre is the chosen version, except for 60% of respondents, what? This particular snippet shows what I consider to be a problem with the implementation of the BOM concept by reusing the same rules for regular POMs, that is, a BOM in theory should only be concerned by defining potential dependencies using a <dependencyManagement> block and nothing more, however as we can see in this example BOM files can define explicit dependencies as well. One can argue that there might be cases where BOM authors want to fix a specific dependency no matter who consumes it (i.e, the producer sets the version), while on the other side consumers would like to have the last say. What can you do to fix this problem as a consumer? Easy, define an explicit dependency on the consumer project, it's more verbose but solves the problem.
Alright, back to the base example shown in Question #11 but this time the consumer project also adds an entry to its lookup table. We have then 2 possibilities in the lookup table, or do we? Knowing that values in the lookup table will be overwritten in the order they are found results in 26.0-jre being the chosen version this time. 62% of respondents thought otherwise.
Last one. This snippet adds an explicit version on the consumer project, making it the nearest one, which should make it the selected one, right? Correct! This means the chosen version is 28.2-jre once again which 68% of respondents got right.
And that's all folks. As it turns out every single snippet produces a valid result as mentioned before, there are no build errors at all. Granted, some of these snippets are a bit contrived but trust me, you find every single one of them in the wild and as you now know some of them can be optimized/refactored to produce the desired results.
As it turns out questions #1, #5, #12, and #13 proved to be the trickiest ones
Let's look at the performance of each section. Before the quiz I was sure the results for Section I would be high as that's just dealing with a single project (narrator: he was delusional); Section II might be a bit lower because of the introduction of a parent POM; and Section III would be definitely lower as I expected less people to be acquainted with BOMs. So how did I do with my predictions?
Weeeell ... the trends are OK but I must confess I expected Sections I and II to perform better. Let's have a look at each individual section, shall we?
Ah yes, questions #1 and #5 have definitely pulled down the numbers for this section.
I was hoping to see numbers higher than 60% in Section II but that was not the case. Questions #8, #9, and #10 add a <dependencyManagement> block which would account for lower scores.
And Section III is, well, it is what it is. BOM files have yet to be understood. Finally let's look at the distribution of "Build Errors"
Aside from Question #1 most questions had a low percentage of "Build Error" as chosen answer, however look at Section III where the numbers are much closer to one another, somewhat indicating that BOMs are not well understood, or people simply gave up trying to get the right answer.
A big thank you to everyone that participated in this quiz, it was a fun one to run. It forced me to re-evaluate my understanding of how Maven dependency resolution works, I thought I had it all figured out by now, but I also got some answers wrong.
Here are a couple of anonymous comments left on the quiz:
"I would hardly ever mix versions in dependency management and dependencies in the same pom."
Right, as mentioned earlier it appears to make little sense to do so in a BOM, as it forces a dependency on downstream consumers. Adding both in a consumer project results in the explicit version being chosen 100% of the time, so if you see this idiom in a POM then make sure to remove the entry in the lookup table OR remove the explicit dependency.
"Pretty tricky, even for an almost 20 year maven dude. Good job!"
Correct, some snippets seemed like the answer was pretty straight forward and then boom! Wrong answer.
"Dependency management and transitive dependencies will bring us straight into dependency hell."
Not necessarily. The <dependencyManagement> block exists to provide an alternative to handling transitive dependencies but it does not cover all cases. Exclusions are another way to do it. The maven-enforcer-plugin should also be used to fix these issues. As a matter of fact there's a great presentation (video) by Robert Scholte and Ray Tsang on this topic.
Keep on coding!