Why you need a CloudWatch alarm, for CloudWatch

As your journey to the AWS cloud progresses it becomes more and more important to automate the monitoring of your cloud-usage and cloud-spend. What works for a single account becomes unfeasible when you are creating new accounts every week. To that end we create account-vending machines with landing-zones, automating the creation of accounts that are nicely outfitted to mitigate risk while still allowing developers their freedom. Such a landing-zone can contain various alarms, including billing alarms based on predicted cloud-spend. Developers in turn add their own metrics and alarms to gain visibility over their application and to contain their expenses.

During this process account administrators and developers alike often end up relying on CloudWatch. CloudWatch is the service on AWS that acts as a central hub for all your logs and metrics. It is also the place to create alarms and automate some of the remediation actions. But what if the service you use to control cloud spend ends up being the most expensive one you use? That might be somewhat unexpected, but sadly bill-shock from using CloudWatch is an unwelcome surprise that happens all too easily.

This is not necessarily because CloudWatch is too expensive. The service is priced in a similar way to other serverless services, and should be treated as such. Just like other serverless services, CloudWatch gives you flexibility and a lower operations burden. And just like other serverless services, CloudWatch must be monitored and controlled. But CloudWatch is different from other services. Using it is often not a conscious action, no limits can be imposed on its use and any increased cost is only visible with a delay. Your CloudWatch bill can go from cheap to expensive in minutes.

Luckily we can identify when CloudWatch usage goes out of control and limit the damage. In this post we will discuss the billing behind CloudWatch, why CloudWatch is different from other services and how to avoid incurring massive cost while using it.

CloudWatch logs

Figure 1. What CloudWatch looks like in practice: an overview of Log streams for a Log group.

The billing behind CloudWatch

With CloudWatch you pay for what you use, based on certain dimensions such as the number of alarms you configure or the amount of data you ingest. A free tier is available, but it is not of any meaningful size. A full overview of the free-tier and all pricing dimensions can be found here. A simplified view is presented in the following table:

DimensionPricing
Metrics0.30 $/month
Dashboards3.00 $/dashboard per month
Alarms0.10 $/alarm
Events1.00 $/million events
Logs0.57 $/GB ingested
0.03 $/GB stored per month

Table 1. Simplified overview of CloudWatch pricing.

At first glance CloudWatch does not seem to be a service that you would need to watch out for. Yes, like any serverless service it is difficult to predict how much it will cost you. And yes, 57 Cents per GB of logs ingested into CloudWatch is on the expensive side. But like all other serverless services, it is nothing to be alarmed about with normal usage.

Why CloudWatch is different

There are many seemingly more expensive services to be found on AWS, so why are we singling out CloudWatch? It is a combination of reasons that makes CloudWatch different. That is because CloudWatch is the unavoidable default for logging from any AWS service. Because CloudWatch cannot be (rate-) limited in any way. Because its bill often arrives too late. By themselves these reasons are not a problem per se, but combine them and you will quickly see why receiving a CloudWatch bill can be a frustrating experience.

First of all, using CloudWatch is not a choice. It is the default and an unavoidable service for logs in your account. Any output in any service will go through CloudWatch. Furthermore, no API call is necessary on the part of the developer, all content sent to standard out will be sent to CloudWatch. This makes it easy to mess up if you misjudge the amount of logging your application will output.

Second, there is no way to limit CloudWatch usage in any way. No matter how much you use CloudWatch it will work tirelessly and without fault. This is a great benefit, but also a great risk. Unlike with AWS Lambda or with DynamoDB, there is no way to impose an upper limit and mitigate worst-case scenario’s.

Last, CloudWatch costs can remain under the radar for too long. Even if a user has configured billing alarms, it will take a day for the increased cost to be registered on the billing overview. Depending on the cause of the increase this can cost thousands of dollars.

It is too easy to mess up and incur costs with CloudWatch, and when you do there is no limit to how bad it can get. This problem is most obvious when other serverless services are used, but it is not limited to them.

How to avoid unexpected CloudWatch expenses at scale

So how do we avoid the nightmare scenario of an insance CloudWatch bill? The solution is two-fold: awareness and guardrails.

First of all developers must be aware of the pricing of CloudWatch, especially those working with serverless services like AWS Lambda. Functions calling themselves recursively is an absolute no-go, and logging minute details in for loops is not a good idea either.

Second, it is important to configure timely alarms so that increased cost does not fly under the radar for too long. While billing alarms are Cloud Development 101, just a billing alarm will not be sufficient here. We need a faster way of detecting that our usage of CloudWatch is above expectations. The solution is very obvious and takes the shape of a custom alarm based on how much data CloudWatch ingests.

Yes, it is necessary to watch the watcher, monitor the monitor so to speak. We can do this by monitoring the IncomingBytes metric on the level of our AWS Account. If this metric exceeds a certain amount of bytes we can ring an alarm. This number will obviously depend on how much logging actually takes place in your account.

To select a baseline you can go to the Metrics page in CloudWatch and view your usage by selecting Logs -> Account Metrics -> IncomingBytes. Tweak the period to display in the upper-right corner and identify your normal usage range. Set your alarm to a higher value, for example twice your normal usage. If a baseline is not available then setting the alarm to 1GB per hour is still sufficient. These values seem high, but remember: there is no sensible upper limit on CloudWatch, so any heads-up we can get is welcome.

CloudWatch logs

Figure 2. This is how a baseline usage graph could look like. Note the configuration in the bottom-right.

The following example alarm is given in Java on AWS CDK. It can be easily converted to CloudFormation, terraform of simply replicated by hand by going to CloudWatch and then configuring an alarm by hand. The threshold has been set to 1GB of ingested logs in a period of an hour. Missing data is treated as not breaching, aka no logging is seen as good. An SNS topic is used to send an email notifying any subscribers should the threshold ever be exceeded.

public class AlarmStack extends Stack {

    private final String ADMIN_EMAIL = "admin@example.com";

    public AlarmStack(final Construct scope, final String id, final StackProps props) {
        super(scope, id, props);

        Alarm incomingBytesAlarm = Alarm.Builder.create(this, "IncomingBytesAlarm")
                .alarmName("IncomingBytesAlarm")
                .metric(Metric.Builder.create()
                        .namespace("AWS/Logs")
                        .metricName("IncomingBytes")
                        .statistic("Sum")
                        .account(this.getAccount())
                        .period(Duration.hours(1))
                        .build())
                .comparisonOperator(ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD)
                .threshold(1000000000)
                .treatMissingData(TreatMissingData.NOT_BREACHING)
                .evaluationPeriods(1)
                .actionsEnabled(true)
                .build();

        Topic adminTopic = Topic.Builder.create(this, "AdminNotificationTopic")
                .topicName("AdminNotificationTopic")
                .build();

        adminTopic.addSubscription(new EmailSubscription(ADMIN_EMAIL));

        incomingBytesAlarm.addAlarmAction(new SnsAction(adminTopic));
    }
}

Conclusion

CloudWatch is a service that occupies a special place on AWS. It is an unavoidable choice that cannot be limited and can do a lot of damage to your wallet before you notice anything. As such, it should be kept in mind and monitored. Luckily setting up an alarm is not too difficult. It is possible to monitor the IncomingBytes metric and notify users in a timely manner.

Encounering a sizable CloudWatch bill can be frustrating. It is important to see CloudWatch in the same light as all the other serverless offerings. They offer great advantages with regard to flexible pricing and scalability, but pose a risk when left uncontrolled. CloudWatch is the same, and should be treated with the same care. It would be better if the service had support for rate-limiting, but until then we will continue to use alarms on this cloud journey.

Ilia Awakimjan

I am a cloud engineer who is passionate about the cloud with a healthy dose of scrutiny. I love to share knowledge and volunteer to write and review exam questions for the AWS Solutions Architect Professional exam.