Cloud Cost Optimization 101
Decision fatigue is real when it comes to FinOps / Cloud Cost Optimization Tooling. Let me help you skip the line with some low-hanging fruit.
As a hands-on cloud consultant who transitioned to Principal Software Architect for a SaaS based optimization platform, I speak from personal experience and deep industry awareness. SaaS tools can put most of this stuff on Easy Mode and help with automation or alerting, but is that worth thousands of dollars per month?
Likely not. The core truth is that a few simple steps can help you achieve 90% of the savings that FinOps tools are promising without the vendor lock-in and savings-share sales models. You can easily capture almost all of the savings an app would provide by following the guidance below:
- Right-Size Your Resources. Are you running databases at 10% CPU? Drop them an instance size or two. You can safely run loads at 40% if you have a cluster and are using read replicas appropriately. Do the same for EC2 and ECS instances, and for Redis/Valkey Cache nodes. A lot of the time, we're afraid of spikes when we shouldn't be. Set up auto-scaling (out/in) and alarms on utilization anomalies to protect yourself.
- Auto-Park Non-Prod Environments. Do you leave your cloud development, staging, test, or QA environments up 24/7? Why? If your team is sleeping, so should your resources. You can save nearly 60% of your costs on most resources by simply turning them off when teams aren't using them. AWS provides a standard/benchmark architecture example of how to achieve this, but this is one area where a SaaS tool can make things much easier.
- If your application compute architectures are flexible, prefer ARM (Graviton), then AMD, then Intel, in that order. Graviton processing should be 20-40% lower cost than the equivalent Intel solution. AMD will be 10% less than Intel (as a rule - exceptions may apply). Switching an app from Intel/AMD to ARM is not as simple as changing it in AWS - your team will need to evaluate build pipelines AND should perform stress/load testing to make sure it's a good fit.
- Use Storage Tiers appropriately. There's no need for everything to sit in S3 Standard. Intelligent Tiering is a great fit for a lot of buckets (objects over 128 KB in size qualify, but you pay monthly fee per 1000 objects, so be judicious). Archive things that can be archived. Delete things that can be deleted. Set and obey your retention policies.
- Use the best-fit disk types for your app instances. At this time in AWS EBS, gp3 outperforms gp2 in most categories and is 20% cheaper (there are exceptions, so do some calculations to see if it works for you).
- A few weeks after right-sizing your resources, cover them with Compute Savings Plans (EC2, ECS, and Lambda) or Reservations (RDS). Even with the shortest-term 1 year no-upfront plans, you can cut your compute and database bills by an easy 20%. Moving to 3-year reservations and some mix of partial or all upfront, that bill could drop by 30-40%.
Maintenance Mode
- Set AWS Budgets - with email alerts at 50%, 75%, 90%, 100%, and 110% of your monthly expected spend. Other cloud providers may also support flexible budgeting, but AWS in particular will allow you to specify monthly budgets for an entire year in one shot.
- Enable AWS Cost Anomaly Detection - this FREE service from AWS will monitor your spend on a daily basis and send you alerts when it sees changes in your spending patterns. This can help you catch a rogue workload before it costs you thousands of dollars.
- Schedule Budget Reviews At Least Quarterly - every quarter, you should loop through all of the steps above and ensure that your resources are being used wisely, that your reservation coverage is sufficient (most clients aim for 80-90% coverage to give them some room to downsize or optimize), and that your resources are in compliance with any cost-control policies (like tag requirements).
Above and Beyond
- Try some free-tier services like https://cloudforecast.io to get a grip on what you're spending. Upgrade to paid tiers on similar services for more detail and user-friendly dashboards.
- Consider self-hosting Hystax OptScale for a professional-quality service on your own infrastructure. This one has many of the most important bells and whistles you'd want in a #finops platform. Technical knowledge required to set it up and maintain it is pretty extensive, but if you're being overcharged by SaaS apps to do the same thing, it could pay off in a few months.
- Join community Slacks like the FinOps Foundation or Vantage.sh Slack. These places are loaded with talented people who usually just like to help others figure things out.