Techblog: Tools and Processes

Maven repositories in corporate environments

26 Jan 2007 (English)

During the last months, I provided consulting services to the application development department of a banking corporation. I helped them improve their software development productivity - by improving processes and tools, and by providing frameworks and "best practices" to developers. We quickly identified build automation and continuous integration to be one of the topics with highest priority, and so we built up a build automation framework based on Apache Maven 2 including the Continuous Integration Server Continuum.

This articles contains some recommendations for establishing Maven as a build system in corporate environments. It is not meant as a complete guide, but discusses some thoughts about how to deal with Maven repositories - an issue that you very soon come across when adopting Maven for your company.

In a previous article, I provided some basic information on the benefits of build automation and continuous integration, on Apache Maven and Continuum, as well as links where you can find additional information.

Set up your own repositories

By default Maven tries to download all build-relevant artifacts from a public Internet repository on repo1.maven.org to each developer,s local repository. While this is very convenient for many open-source projects, the direct access to this repository by each developer is not acceptable for many companies.

First of all, it is obvious that the closed-source applications or libraries that your organization develops in-house, cannot be uploaded to the public repository. The same is true about any third-party libraries that you licensed for your company. To support sharing such artifacts among your developers, set up a company-internal repository.

But even regarding artifacts that are indeed available in the public repository you don't want every developer to directly access that repository, not only due to bandwidth considerations. There are two main approaches:

Proxy approach: You can set up a Maven-capable proxy server for your organization. It dynamically mirrors public repositories such as repo1.maven.org. When a user tries to access an artifact, the proxy server first checks whether the artifact is already available locally. If it is, the proxy delivers it back to the client. If it is not, the proxy downloads it from the public repository, stores it locally and delivers it back to the client. Ob subsequent requests, there's no need to access the public repository again. Typically, such proxy servers can be configured to dynamically mirror more than one public repository. Like this, several repositories can be aggregated into one internal mirror.

Non-proxy approach: Another option is to set up a company-internal repository also for publicly available artifacts, and to upload libraries manually to the repository as you need them.

The non-proxy approach gives your company much more control: You stay in full control which libraries may be used in your company and from what sources they come from. In contrast, by using the public repository you give up some of that control: Someone from outside your company has uploaded the library to the repository. In theory, it could be a faked version containing malicious code. Even though this is highly unlikely for the public repositories managed by the Maven team or by the open-source projects themselves, man-in-the-middle attacks are possible, at least in theory. Instead, when managing the repository on your own, you even have control about which versions of libraries may be used within your company.

But, of course, there's no free lunch: Managing an internal repository for publicly available artifacts means a significant amount of extra work: First of all, you need to identify the correct names and version numbers of the libraries you want to upload. Try to use the same names as on public repositories - that keeps your builds portable in case you change the approach later on. When uploading a JAR file "as is", your repository won't contain much metadata other than the name and version number. In contrast to that, the public Maven repositories contain additional metadata for many artifacts, especially dependency information. If you don't specify a pom.xml containing dependency metadata when uploading the JAR, you lose Maven's powerful transitive dependency management for such artifacts.

Delivering Maven plugins

The build system Maven has a rather small kernel that takes care of such fundamental things such as dependency resolution, repository access and interpretation of the pom.xml. Build steps like compiling sources or packaging JARs are executed by Maven plugins. The most important plugins are maintained by the Apache Maven project itself, but there are other sites like mojo.codehaus.org that provide useful plugins as well. You can even write your own plugins.

Maven accesses plugins just like any other artifact: It downloads them from a remote repository to your local repository, if the plugin isn't there yet. To provide the plugins for your developers, you have the same options as with other build artifacts: You can set up a proxy or you can manage your own internal plugin repository. And then there's a third option: ZIP together a Maven distributable for in-house distribution that contains not only the Maven kernel and common settings, but also an initial local repository with all necessary plugins. It is anyway advisable to create your own Maven package for distribution within your company, e.g. containing common settings.

Choosing the right approach

So... which approach should you choose for your organization? It depends and there's a tradeoff. For many companies the proxy approach will be more convenient and straightforward while other companies, e.g. in the financial services industry, will be forced to use the non-proxy approach due to security restriction, at least for normal build artifacts. For Maven plugins, however, it would be far too much work to manually upload all necessary plugins to an internal plugin repository.

No matter which approach you use, make a distinction between in-house artifacts, "free" (open-source) external artifacts, non-free (licensed) external artifacts and Maven plugins. For in-house artifact, a separation of released versions and "snapshot' versions (interim versions created during development) makes sense.

Setting up an internal repository and/or proxy server

To set up an internal repository, you just need to provide an URL-accessible location, e.g. an HTTP-server. HTTP is the most common and platform-independent option to read from the repository. Uploading options include file-system or network share access via file:// URLs, SCP, WebDAV and others.

If you choose to set up a Maven-aware proxy server, there are at least three options: Maven-Proxy was one of the first available products. It is a self-contained server and is relatively simple: It acts as a proxy but doesn't do much more. If Maven-Proxy runs behind a company firewall, it typically needs to go through an HTTP proxy. Notice that Maven-Proxy has very limited support for proxies using NTLM authentication (= mainly Microsoft proxy servers).

A more powerful option is Proximity. To the normal proxy feature, it adds features like repository browsing and searching, and comes with predefined profiles for personal and corporate use. Proximity can host your internal repositories as well, so it is a kind of "2-in-1" solution.

The Maven project itself has started a subproject called Archiva. It is promising, but currently in an alpha stadium. If you want to check it out, you need to get the source code and build the project yourself. Many companies prefer to wait for a stable release before using Archiva.