There's a special kind of pain that comes from checking your cloud bill at month end. It's like that moment when you check your M-Pesa balance after a weekend in Nairobi and wonder if you were sleepwalking through restaurants.
"Wait, we spent HOW MUCH on EC2 instances nobody is using?"
By 2025, cloud spending is often the second biggest expense after salaries for tech companies. And unlike salaries, where at least you can see people working, cloud costs are invisible until the bill arrives like a matatu tout demanding fare you didn't know you owed.
Let me show you how to optimize without sacrificing performance. Your CFO will think you're a magician.
Understanding your infrastructure options is crucial - read EC2 vs ECS vs Fargate to choose the most cost-effective compute option for your workload.
The Cost Optimization Philosophy
Here's the truth: Cloud providers WANT you to overspend. Not maliciously, but their default settings are like a buffet where everything looks delicious and you pile your plate with food you'll never finish.
The key is changing from "what's possible" to "what's necessary."
Strategy 1: Rightsizing (The Low-Hanging Fruit)
Rightsizing is fancy cloud-speak for "stop using a matatu to transport one person."
The Problem: Developers spin up an m5.2xlarge (8 CPUs, 32GB RAM) for a dev environment that gets used 3 hours a day and has 5% CPU utilization.
The Math:
- m5.2xlarge: ~$0.384/hour = $276/month running 24/7
- t3.medium (2 CPUs, 4GB RAM): ~$0.0416/hour = $30/month
Savings: $246/month per instance. Now multiply by the 15 instances your team "forgot about."
How to Actually Do This
AWS Compute Optimizer (built-in, free):
# Install AWS CLI if you haven't
aws configure
# Get rightsizing recommendations
aws compute-optimizer get-ec2-instance-recommendations
It'll tell you things like: "This instance has averaged 2% CPU for 14 days. Consider t3.small instead."
For GCP:
gcloud recommender recommendations list \
--project=YOUR_PROJECT \
--recommender=google.compute.instance.MachineTypeRecommender
For Azure: Azure Advisor does this automatically in the portal under "Cost Recommendations."
The Downgrade That Saves Millions
Real example from the trenches: A startup was running their staging environment (used only during work hours, 9am-6pm, Monday-Friday) on the same instance sizes as production.
Before: 20 m5.xlarge instances x 24/7 = $3,456/month After: 20 t3.medium instances x 45 hours/week = $155/month
Savings: $3,301/month = $39,612/year
For what? Staging. The environment where you test if your "fix" actually works before shipping.
Strategy 2: Reserved Instances & Savings Plans
This is like joining Bonga Points or that Carrefour points thing or any loyalty program. Commit to using the service for 1-3 years, get a massive discount.
The Options
AWS Reserved Instances:
- 1 year: ~40% discount
- 3 years: ~60% discount
- Pay all upfront: Extra 5% off
- Pay monthly: Less discount but better cash flow
GCP Committed Use Discounts (CUDs):
- 1 year: ~25% discount
- 3 years: ~50% discount
Azure Reserved Instances:
- 1 year: ~40% discount
- 3 years: ~72% discount (most aggressive pricing)
The Strategy (Don't Commit to Everything)
Look at your instance usage over the last 90 days. What's your "base load" - the minimum number of instances you ALWAYS have running?
Example:
- Peak: 50 instances
- Average: 30 instances
- Minimum (3am on Sunday): 10 instances
Smart play: Buy Reserved Instances for 10-15 instances. Run the rest on-demand or spot instances.
Why? Because committing to 50 Reserved Instances when you only need 10 during off-peak is like paying for a full buffet when you only want the chicken.
Strategy 3: Spot Instances (The Clearance Sale)
Spot instances are spare compute capacity that cloud providers sell at 60-90% discount. The catch? They can take it back with 2 minutes notice.
Think of it like: Those last-minute flight deals. Cheap, but the airline can cancel on you. So don't use it for mission-critical stuff.
Perfect Use Cases
1. CI/CD Pipelines Your GitHub Actions or Jenkins jobs don't care if they get interrupted. They'll just retry.
# Example: GitHub Actions with Spot instances
- uses: aws-actions/configure-aws-credentials@v2
- run: |
aws autoscaling create-launch-template \
--instance-market-options '{"MarketType":"spot"}'
2. Batch Processing Processing videos, generating reports, ETL jobs - if they can resume, use Spot.
3. Kubernetes Worker Nodes Run your stateless pods on Spot nodes. If a node disappears, Kubernetes reschedules the pods elsewhere.
AWS EKS with Spot:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
nodeGroups:
- name: spot-workers
instancesDistribution:
instanceTypes: ["m5.large", "m5.xlarge", "m5.2xlarge"]
onDemandBaseCapacity: 2 # Keep 2 on-demand for stability
onDemandPercentageAboveBaseCapacity: 0
spotInstancePools: 3
Real Savings:
- On-demand m5.xlarge: $0.192/hour
- Spot m5.xlarge: $0.057/hour (70% discount)
- 10 instances running 730 hours/month: Save $985/month
Strategy 4: Storage Optimization (The Silent Killer)
Storage costs are like cockroaches. Small individually, but they multiply and before you know it, you're spending $5k/month on "miscellaneous storage."
The Storage Lifecycle
AWS S3 Tiers:
- S3 Standard - $0.023/GB - For data you access frequently
- S3 Intelligent-Tiering - Auto-moves data based on access patterns
- S3 Glacier - $0.004/GB - For archives (retrieval takes hours)
- S3 Glacier Deep Archive - $0.00099/GB - For "we legally have to keep this for 7 years" kinda stuff
Set up automatic transitions:
# S3 Lifecycle Policy
- Id: MoveOldLogs
Status: Enabled
Transitions:
- Days: 30
StorageClass: INTELLIGENT_TIERING
- Days: 90
StorageClass: GLACIER
Expiration:
Days: 365
The Orphaned Volume Problem
When you terminate an EC2 instance, the EBS volume (disk) often stays. It's like canceling your gym membership but still getting charged for the locker.
Find and delete orphaned volumes:
# AWS
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].[VolumeId,Size,CreateTime]' \
--output table
# Delete them
aws ec2 delete-volume --volume-id vol-xxxxx
Real example: A Series A startup found 47 unattached EBS volumes totaling 2.3TB. At $0.10/GB/month, that's $230/month on storage for deleted servers.
Snapshot Cleanup
You don't need daily snapshots from 2022. Set up auto-deletion:
# AWS - Delete snapshots older than 90 days
aws ec2 describe-snapshots --owner-ids self \
--query 'Snapshots[?StartTime<=`2024-09-01`].[SnapshotId]' \
--output text | xargs -n 1 aws ec2 delete-snapshot --snapshot-id
Strategy 5: Tagging (You Can't Optimize What You Can't Measure)
This is the unglamorous work that makes everything else possible.
The Rule: Every resource must have these tags:
Environment(Production, Staging, Dev, Test)Team(Backend, Frontend, Data, DevOps)Project(MobileApp, WebApp, API)Owner(email of the person responsible)CostCenter(which department's budget)
Why? Because when the bill says "EC2: $15,000" and you don't know who spent it on what, you can't fix anything.
Enforce it with AWS Config:
{
"ConfigRuleName": "required-tags",
"Source": {
"Owner": "AWS",
"SourceIdentifier": "REQUIRED_TAGS"
},
"InputParameters": {
"tag1Key": "Environment",
"tag2Key": "Owner",
"tag3Key": "Project"
}
}
Now you can filter your Cost Explorer: "Show me all spend by the Marketing team on the Mobile App project in Production."
Suddenly, "Why is staging costing more than production?" becomes an answerable question.
Strategy 6: The Nuclear Option (Turn Things Off)
The best optimization is not running things at all.
Dev/Staging Environments: Working Hours Only
AWS Instance Scheduler:
# Lambda function that stops instances at 6pm, starts at 9am
Periods:
- Name: WorkHours
BeginTime: "09:00"
EndTime: "18:00"
WeekDays: "Mon-Fri"
Schedules:
- Name: DevSchedule
Periods: [WorkHours]
Timezone: "Africa/Nairobi"
Savings: If your dev environment costs $2,000/month running 24/7, running it only 45 hours/week saves $1,700/month.
That's $20,400/year. For doing literally nothing except turning off lights when you leave the office.
For batch jobs and non-critical workloads, consider serverless computing where you pay only for execution time, not idle capacity.
Kubernetes autoscaling helps control costs too - learn Kubernetes basics to implement horizontal pod autoscaling that matches resource usage to actual demand.
Strategy 7: Database Optimization
Databases are expensive, especially managed ones.
AWS RDS Checklist:
- ✅ Use Read Replicas for reporting queries (don't hammer production)
- ✅ Switch to Aurora Serverless v2 for variable workloads
- ✅ Use Reserved Instances for prod databases (60% discount)
- ✅ Export old data to S3/Glacier (database storage is $0.115/GB vs S3 $0.023/GB)
Real Example:
- Old way: 500GB RDS MySQL = $57.50/month in storage
- New way: 100GB hot data in RDS ($11.50/month) + 400GB in S3 ($9.20/month)
- Savings: $36.80/month per database
For static assets and media files, use S3 + CloudFront CDN - cheaper than serving from EC2 and faster for users.
The Cost Optimization Checklist (Do These Today)
| Action | Difficulty | Savings Potential |
|---|---|---|
| Delete unattached EBS volumes | Easy | Medium |
| Set up S3 lifecycle policies | Easy | Medium |
| Rightsize obvious oversized instances | Easy | High |
| Delete old snapshots | Easy | Low |
| Tag everything | Medium | High (enables everything else) |
| Implement dev/staging shutdown schedules | Medium | High |
| Buy Reserved Instances for base load | Easy | Very High |
| Migrate batch jobs to Spot instances | Hard | Very High |
The FinOps Culture Shift
This isn't just a DevOps problem. It's a cultural one.
Make cost visible: Put a Grafana dashboard in the office showing daily spend. When engineers see "yesterday cost $450 more than normal," they'll investigate.
Give teams budgets: "Frontend team, you have $2k/month for your services. Anything over, we need to discuss."
Celebrate savings: When someone optimizes something and saves $10k/year, announce it. Make cost optimization as celebrated as shipping features.
The Bottom Line
Cloud optimization isn't a one-time thing. It's a habit.
- Monthly: Review cost reports, look for anomalies
- Quarterly: Check rightsizing recommendations
- Yearly: Re-evaluate Reserved Instance commitments
Start with the easy wins (delete orphaned volumes, set up S3 lifecycle policies). Then tackle the bigger stuff (Reserved Instances, Spot instances).
And remember: every dollar saved on cloud spend is a dollar you can spend on actually building features or, you know, paying people.
Don't put all your eggs in one basket - explore multi-cloud strategies to leverage competitive pricing across AWS, GCP, and Azure.
Check your cloud bill. I'll wait here while you panic, then help you optimize it.