Monorepo vs Multirepo Architecture: How to Decide?

Introduction

Monorepo and multirepo architectures are two popular approaches for organizing codebases in software development. Each has its own advantages and challenges, and the choice between them can significantly impact your project's workflow, scalability, and maintainability. In this article, we’ll explore the key differences between these architectures, their pros and cons, and provide guidance on how to choose the right approach for your team and projects.

What Are Packages? 📦

To make software development more manageable, developers break code into smaller, reusable, and modular chunks. In simple projects, this involves splitting code into multiple files. However, as complexity grows, it becomes necessary to organize code into packages.

A package groups related code files under a single logical entity, creating clear boundaries in your software. For instance, code for an iOS app should be in a separate package from an Android app since they serve different platforms and have no overlap. Thus, it makes sense to have distinct iOS and Android packages.

Packages are also useful for sharing functionality across projects. For example, a logging package could provide a standardized way to format logs, which can then be reused across multiple packages that need logging functionality.

How to Organize Packages

Once a project is divided into packages, the next challenge is deciding where these packages should live. There are two main approaches:

Multirepo: Each package resides in its own repository.
Monorepo: All packages are stored in a single repository.

Choosing the right approach is critical, as the wrong decision can significantly complicate development. Unfortunately, there’s no universal answer—it depends on the specific needs of your projects. To make matters more complex, project requirements often evolve, meaning the choice you make today might not suit you tomorrow.

Multirepo 🗂️

In a multirepo setup, each package is stored in its own repository. For example, a project with three packages might look like this:

@prosopo/a
@prosopo/b
@prosopo/c

To use these packages together, a fourth repository must be created to manage them, often using git submodules to clone each package into its own directory. This allows each package to maintain its own git history, branches, and commits.

Pros of Multirepo ✅

Independence: Reduces the risk of cross-package breakages and dependency conflicts.
Isolation: Each package can be developed in isolation with its own issue tracker.
Parallel Development: Teams can work on different packages simultaneously without interference.
Custom Pipelines: CICD pipelines can be tailored to the specific needs of each package.
Security: Access can be restricted to individual repositories, limiting exposure.
Performance: Smaller repositories are faster to clone and manage with git.

Cons of Multirepo ❌

Duplication: Configuration files, CICD pipelines, and documentation may need to be duplicated across repositories, leading to maintenance challenges.
Versioning Complexity: Managing compatibility between packages becomes cumbersome.
Deployment Challenges: Coordinating deployments across multiple repositories can be difficult.
Limited Visibility: Developers lack a unified view of all packages.
Git Submodules: Managing submodules can be error-prone and adds complexity.
Refactoring Difficulties: Refactoring across multiple repositories is time-consuming and error-prone.

Monorepo 🏗️

In a monorepo setup, all packages are stored in a single repository. For example:

/packages/
- @prosopo/a
- @prosopo/b
- @prosopo/c

A tool like npm can set up a workspace, which is a special package that contains multiple packages. In this example, the workspace is / and contains the three packages under the /packages/ directory.

Pros of Monorepo ✅

Atomic Changes: Easily make changes that affect multiple packages (e.g., a shared package).
Unified Tooling: All tooling, CICD pipelines, etc., can be shared between packages.
Visibility: All packages are visible and discoverable in a single place, reducing the burden on new developers.
Single Source of Truth: A single commit represents the state of all packages.
Simplified Pipelines: Testing, versioning, releasing, and deploying become easier when all packages are in the same repository.
Ease of Development: All code is in one place, making changes simpler and allowing issues or incompatibilities to be discovered immediately.
Fosters Collaboration: Developers need to communicate to make changes across multiple packages.

Cons of Monorepo ❌

Scalability: git can become slow with a large repository.
Complex Pipelines: CICD can become complex if not managed properly.
Risk of Tight Coupling: Service boundaries can easily be broken, introducing unintended dependencies.
Permission Limitations: Anyone with access can see and modify the entire project's code.
Overwhelming: Can be a steep learning curve for newcomers if not documented well.

Which to Choose?

A common source of confusion is whether to use a monorepo or multirepo, which packages should belong to each, and the assumption that you must choose one approach exclusively.

To decide, think in terms of projects. A project is a group of packages that deliver a software platform. For example, a social media platform might require packages for an Android app, iOS app, server, database, etc. Some packages are logically independent (e.g. Android/iOS apps), while others are interdependent (e.g. server and database). However, all packages collectively deliver the social media product.

For such a project, it makes sense to use a monorepo. This setup simplifies refactoring, versioning, deployment, and maintenance, enabling faster development and easier future updates.

Yet Another Project

This approach works well until you start another project and need code from the first project. You then face two options: copy and paste the code or introduce a dependency. For small snippets, copying is fine. For larger chunks, it’s better to create a shared package. This is where a multirepo architecture becomes useful.

In this scenario, each project has its own monorepo but shares some code, such as a logging package. Moving the logging package to its own repository and depending on it in both monorepos is the right choice. However, consider:

Independent Versioning: Can the package be versioned independently? If not, it should stay in the monorepo.
Deployment Independence: Can the package be deployed independently? Shared packages like logging often don’t require deployment. If deployment is needed, it belongs in the monorepo.
Maturity: Is the code stable and infrequently updated? If not, it’s better left in the monorepo for easier refactoring.

It’s not a major issue if a small package is duplicated across monorepos because pulling it out into its own repository is unsuitable. The key is to minimize duplication where possible to maintain consistency and ease of maintenance.

Another option is a shared monorepo. This setup groups shared packages, such as logging or utilities, under a single repository. While this provides advantages like versioning and refactoring, it adds complexity and diminishes some benefits of a monorepo. Use this approach cautiously and only when you have a set of stable, shared libraries that make sense to be grouped together.

What Did We Choose?

At Prosopo, we initially adopted a multirepo architecture. This worked well for isolated code like smart contracts but became problematic for numerous npm packages with internal dependencies.

After struggling with version management, we switched to a monorepo approach and haven’t looked back. The main drawback is the large repository size, which takes time to clone. However, the benefits of centralized version management, shared tooling, and unified code have made development faster and easier.

We are considering moving some stable, independent packages out of the monorepo into their own repositories. These packages are:

Logically independent
Libraries without deployment requirements
Mature and infrequently updated

This would result in a monorepo setup with a few libraries managed via submodules in a multirepo setup.

Real-World Examples

To better understand the practical applications of monorepo and multirepo architectures, let’s look at how some well-known companies manage their codebases:

Google: Google famously uses a monorepo to manage the majority of its codebase. This approach allows them to maintain a single source of truth, enabling seamless collaboration across teams and simplifying dependency management. However, they’ve invested heavily in custom tools like Bazel to handle the scale of their repository.
Facebook: Facebook also uses a monorepo for its core projects. This setup allows them to make atomic changes across multiple packages and ensures that all teams work with the latest code. They’ve developed tools like Buck to optimize build times and manage the complexity of their monorepo.
Netflix: Netflix, on the other hand, prefers a multirepo approach. Their microservices architecture benefits from the independence and isolation provided by multirepos, allowing teams to work autonomously and deploy services independently.
Microsoft: Microsoft employs a hybrid approach. For example, their Azure DevOps platform uses a monorepo for tightly coupled components but relies on multirepos for independent libraries and tools.

TLDR ✨

Start with a monorepo for your project. Over time, it will become clear which packages are independent and should be moved to their own repositories, linked via git submodule.

Avoid starting with a multirepo unless you’re prepared for added complexity.

Decision-Making Framework

Choosing between a monorepo and a multirepo can be challenging. Use the following framework to guide your decision:

Project Scope:
- Are the packages tightly coupled and frequently updated together?
  → Use a monorepo.
- Are the packages independent and rarely interact?
  → Use a multirepo.
Team Size:
- Is your team small and collaborative?
  → A monorepo simplifies coordination.
- Do you have multiple teams working independently?
  → A multirepo provides better isolation.
Tooling and Infrastructure:
- Do you have tools to manage a large repository (e.g., Bazel, Nx)?
  → A monorepo is feasible.
- Are you relying on standard Git workflows?
  → A multirepo might be easier to manage.
Deployment Requirements:
- Do the packages need to be deployed together?
  → Use a monorepo.
- Do the packages have independent deployment pipelines?
  → Use a multirepo.
Future Growth:
- Will the project grow significantly in size and complexity?
  → Consider starting with a monorepo and transitioning to a hybrid approach if needed.
Accelerating Growth with Rapid Delivery:
- Is your project focused on rapid development and iteration?
  → A monorepo streamlines refactoring, testing, and deployment, enabling faster delivery. This approach has been pivotal for Prosopo in driving high growth and ensuring quick software delivery.

By answering these questions, you can determine which architecture aligns best with your project’s requirements and long-term goals.

Conclusion

Choosing between monorepo and multirepo is challenging, and there’s no one-size-fits-all solution. Often, the best approach is a mix of both, but making this decision early in a project is difficult due to unclear requirements. Therefore, the safest strategy is to start with a monorepo. Over time, you’ll identify packages that warrant their own repositories and can be linked via git submodule.

To maintain development velocity, a monorepo is the safest choice.