Introduction: Smart Deployment and Scaling
One of the main advantages of the modular monolith is deployment simplicity: a single artifact, a single CI/CD pipeline, a single process to monitor. But simplicity does not mean limitation. With the right strategies, a modular monolith can be deployed with zero downtime, scaled intelligently, and managed with the same sophistication as a microservices architecture, at a fraction of the operational cost.
In this article we will explore advanced deployment strategies for the modular monolith: blue-green deployments, feature flags to decouple deployment from release, metric-based auto-scaling, and the criteria for deciding when to extract a module as an autonomous microservice.
What You Will Learn in This Article
- Blue-green deployment for zero-downtime updates
- Feature flags: decoupling deployment from feature release
- Auto-scaling with Kubernetes HPA and custom metrics
- Contextual scaling: optimizing resources per module
- Database scaling: read replicas, connection pooling, sharding
- When to extract a module as a microservice
- Example Kubernetes configuration for the modular monolith
Monolithic Deployment: Build Once, Deploy Everywhere
Deployment of a modular monolith follows the "build once, deploy everywhere" principle: a single artifact (JAR, Docker image) is built once and deployed to all environments (staging, production, DR). This eliminates the risk of inconsistencies between environments and drastically simplifies the CI/CD pipeline.
# CI/CD Pipeline for modular monolith
# A single artifact for all environments
# .github/workflows/deploy.yml
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build application
run: ./gradlew build
- name: Build Docker image
run: |
docker build -t ecommerce-app:$GITHUB_SHA .
- name: Push to registry
run: |
docker push registry.example.com/ecommerce-app:$GITHUB_SHA
deploy-staging:
needs: build
runs-on: ubuntu-latest
steps:
- name: Deploy to staging
run: |
kubectl set image deployment/ecommerce \
app=registry.example.com/ecommerce-app:$GITHUB_SHA \
--namespace staging
deploy-production:
needs: deploy-staging
runs-on: ubuntu-latest
environment: production # requires manual approval
steps:
- name: Deploy to production
run: |
kubectl set image deployment/ecommerce \
app=registry.example.com/ecommerce-app:$GITHUB_SHA \
--namespace production
Blue-Green Deployment: Zero Downtime
Blue-green deployment maintains two identical environments: blue (active, serving traffic) and green (inactive, ready for the next release). Deployment occurs on the inactive environment, and the traffic switch is instantaneous through a load balancer or Kubernetes Service change.
Advantages
- Zero downtime: the switch is instantaneous, no interruption for users
- Immediate rollback: if something goes wrong, just switch back to the previous environment
- Pre-switch validation: you can test the new deployment on the green environment before switching
# Kubernetes: Blue-Green deployment with Service switch
# Blue deployment (currently active)
apiVersion: apps/v1
kind: Deployment
metadata:
name: ecommerce-blue
labels:
app: ecommerce
version: blue
spec:
replicas: 4
selector:
matchLabels:
app: ecommerce
version: blue
template:
metadata:
labels:
app: ecommerce
version: blue
spec:
containers:
- name: app
image: registry.example.com/ecommerce-app:v1.2.0
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
---
# Service pointing to the active deployment
apiVersion: v1
kind: Service
metadata:
name: ecommerce-service
spec:
selector:
app: ecommerce
version: blue # Change to "green" for the switch
ports:
- port: 80
targetPort: 8080
Feature Flags: Decoupling Deploy and Release
Feature flags allow deploying code to production without immediately activating it. A feature can be activated gradually (canary release), for specific users (beta testing), or for specific regions. This completely decouples the moment of deployment (code in production) from the moment of release (feature visible to users).
// Feature Flags with Spring Boot and Unleash
@Service
class OrderServiceImpl implements OrderModuleApi {
private final FeatureFlagService featureFlags;
@Override
public OrderDto createOrder(CreateOrderCommand cmd) {
Order order = Order.create(cmd);
// Feature flag: new pricing system
if (featureFlags.isEnabled("new-pricing-engine")) {
order.applyNewPricing(pricingEngine.calculate(cmd));
} else {
order.applyLegacyPricing(cmd.getItems());
}
// Feature flag: push notification (gradual)
if (featureFlags.isEnabled("push-notifications",
cmd.getUserId())) {
notificationModule.sendPush(order);
}
orderRepository.save(order);
return order.toDto();
}
}
// Feature Flags configuration with Unleash
@Configuration
class FeatureFlagConfig {
@Bean
public Unleash unleash() {
return new DefaultUnleash(
UnleashConfig.builder()
.appName("ecommerce-app")
.instanceId("instance-1")
.unleashAPI("http://unleash:4242/api")
.build()
);
}
}
Auto-Scaling with Kubernetes HPA
The Kubernetes Horizontal Pod Autoscaler (HPA) automatically scales the number of pods based on CPU, memory, or custom metrics. For a modular monolith, HPA scales the entire application uniformly. Custom metrics based on business-specific indicators like orders per minute can be configured.
# HPA for the modular monolith
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ecommerce-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ecommerce-app
minReplicas: 3
maxReplicas: 12
metrics:
# Scale based on CPU
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Scale based on custom metrics
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 120
Database Scaling
In a modular monolith, the database is often the bottleneck before compute. Here are the database scaling strategies:
Read Replicas
Configure database read replicas and route read queries to the replicas. Write operations go to the primary. This is particularly effective with the CQRS pattern, where the read model can be on a dedicated replica.
Connection Pooling
Use a connection pooler like PgBouncer to efficiently manage database connections. With many monolith instances, the number of database connections can grow rapidly. PgBouncer reduces the number of effective database connections.
Per-Module Sharding
If a specific module has much higher data volumes than others, you can move its tables to a dedicated database. This is an intermediate step toward microservice extraction.
When to Extract a Module as a Microservice
The modular monolith does not necessarily have to remain a monolith forever. When data justifies it, a module can be extracted as an independent microservice. Here are the decisive criteria:
Quantitative Criteria
- 10x Scaling: the module requires 10 times more resources than others
- Deploy frequency: the module needs 5x more frequent releases
- Latency SLA: the module has different latency SLAs (e.g., < 10ms for a critical API)
- Team size: more than 5 developers work exclusively on the module
Qualitative Criteria
- Technology stack: the module would benefit from a different language or runtime
- Failure isolation: a bug in the module can crash the entire system
- Compliance: the module handles data with specific security or compliance requirements
- Caching: the module has very different caching patterns from the rest
Practical Extraction Rule
Do not extract a module as a microservice until you have at least 3 quantitative criteria satisfied. Premature extraction introduces distributed complexity without proportional benefits. Remember: the modular monolith is designed to make extraction easy when needed, not to force it.
Complete Kubernetes Configuration
Here is a reference Kubernetes configuration for deploying a modular monolith in production, with health checks, resource limits, and affinity rules:
# Complete deployment for modular monolith
apiVersion: apps/v1
kind: Deployment
metadata:
name: ecommerce-app
labels:
app: ecommerce
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: ecommerce
template:
metadata:
labels:
app: ecommerce
spec:
containers:
- name: app
image: registry.example.com/ecommerce-app:latest
ports:
- containerPort: 8080
env:
- name: SPRING_PROFILES_ACTIVE
value: "production"
- name: JAVA_OPTS
value: "-Xms512m -Xmx2g -XX:+UseG1GC"
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: ecommerce
Monitoring and Observability
A significant advantage of the modular monolith is monitoring simplicity. With a single process, metrics are centralized, logs are unified, and tracing is local.
- Metrics: Spring Boot Actuator + Micrometer for exporting metrics to Prometheus
- Logs: structured logging with correlation ID for tracing flows between modules
- Health checks: endpoint per module to verify individual health status
- Dashboard: Grafana for visualizing aggregate and per-module metrics
Next Article
In the next article we will tackle the step-by-step migration from a legacy monolith to a modular monolith: the Strangler Fig pattern, boundary identification, physical module extraction, event migration, and a realistic timeline with case study and metrics.







