Management Serverless Uptime Monitoring Lead image: Lead Image © joingate, 123RF.com

Serverless computing with AWS Lambda

Light Work

Monitoring with AWS Lambda serverless technology reduces costs and scales to your infrastructure automatically. By Chris Binnie

For a number of reasons, it makes sense to use today's cloud-native infrastructure to run software without employing servers; instead, you can use an arms-length, abstracted serverless platform such as AWS Lambda.

For example, when you create a Lambda function (source code and a run-time configuration) and execute it, the AWS platform only bills you for the execution time, also called the "compute time." Simple tasks usually book only hundreds of milliseconds, as opposed to running an Elastic Compute Cloud (EC2) server instance all month long along with its associated costs.

In addition to reducing the cost and removing the often overlooked administrative burden of maintaining a fleet of servers to run your tasks, AWS Lambda also takes care of the sometimes difficult-to-get-right automatic scaling of your infrastructure. With Lambda, AWS promises that you can sit back with your feet up and rest assured that "your code runs in parallel" and that the platform will be "scaling precisely with the size of the workload" in an efficient and cost-effective manner [1].

In this article, I show you how to get started with AWS Lambda. Once you've seen that external connectivity is working, I'll use a Python script to demonstrate how you might use a Lambda function to monitor a website all year round, without the need of ever running a server.

For more advanced requirements, I'll also touch on how to get the internal networking set up correctly for a Lambda function to communicate with nonpublic resources (e.g., EC2 instances) hosted internally in AWS. Those Lambda functions will also be able to connect to the Internet, which can be challenging to get right.

On an established AWS infrastructures, most resources are usually segregated into their own virtual private clouds (VPCs) for security and organizational requirements, so I'll look at the workflow required to solve both internal and external connectivity headaches. I assume that you're familiar with the basics of the AWS Management Console and have access to an account in which you can test.

Less Is More

As already mentioned, be warned that Lambda function networking in AWS has a few quirks. For example, Internet Control Message Protocol (ICMP) traffic isn't permitted for running pings and other such network discovery services:

Lambda attempts to impose as few restrictions as possible on normal language and operating system activities, but there are a few activities that are disabled: Inbound network connections are blocked by AWS Lambda, and for outbound connections only TCP/IP sockets are supported, and ptrace (debugging) system calls are blocked. TCP port 25 traffic is also blocked as an anti-spam measure.

Digging a little deeper …, the Lambda OS kernel lacks the CAP_NET_RAW kernel capability to manipulate raw sockets.

So, you can't do ICMP or UDP from a Lambda function [2].

(Be warned that this page is a little dated and things may have changed.)

In other words, you're not dealing with the same networking stack that you might find on a friendly Debian box running in EC2. However, as I'll demonstrate in a moment, public Domain Name Service (DNS) lookups do work as you'd hope, usually with the use of the UDP protocol.

Less Said, The Better

The way to prove that DNS lookups work is, as you might have guessed, to use a short script that simply performs a DNS lookup. First, however, you should create your function. Figure 1 shows the AWS Management Console [3] Lambda service page with an orange Create function button.

Figure 1: The page where you will create a Lambda function.

If you're wearing your reading glasses, you might see that the name of the function I've typed is internet-access-function. I've also chosen Python 3.7 as the preferred run time. I leave the default Author from scratch option alone at the top.

For now, I ignore the execution role at the bottom of the page and visit that again later, because the clever gubbins behind the scenes will automatically assign an IAM profile, trimmed right down, by default: AWS wants you to log in to CloudWatch to check the execution of your Lambda function.

The next screen in Figure 2 shows the new function; you can see its name in the Designer section and that it has Amazon CloudWatch Logs permissions by default. Figure 2 is only the top of a relatively long page that includes the Designer options. Sometimes these options are hidden and you need to expand them with the arrow next to the word Designer.

Next, hide the Designer options by clicking on the aforementioned arrow. After a little scrolling down, you should see where you will paste your function code (Figure 3). A "Hello World" script, which I will run as an example, is already in the code area.

Figure 3: Your Lambda function code will go here in place of the Hello World example.

When I run the Hello World Lambda function by clicking Test, I get a big, green welcome box at the top of the screen (I had to scroll up a bit), and I can expand the details to show the output,

{
  "statusCode": 200,
  "body": "\"Hello from Lambda!\""
}

which means the test worked. If you haven't created a test event yet, you'll see a pop-up dialog box the first time you run a test, and you'll be asked to add an Event Name.

If you do that now, you can just leave the default key1 and other information in place. You don't need to change these values just yet, because, to execute both the Hello World and the DNS lookup script, you don't need to pass any variables to your Lambda function from here. I called my Event Name EmptyTest and then clicked the orange Create button at the bottom.

Next, I'll paste the DNS Python lookup script

import socket
 **
def lambda_handler(event, context):
    data = socket.gethostbyname_ex('www.devsecops.cc')
    print (data)
    return

over the top of the Hello World example and click the orange Save button at the top.

To run the function as it stands (using only the default configuration options and making sure the indentation in your script is correct), simply click the Test button again; you should get another green success bar at the top of the screen.

The green bar will show null, because the script doesn't actually output anything. However, if you look in the Log Output section, you can see some output (Listing 1), with the IP address next to the DNS name you looked up.

Listing 1: DNS Lookup Output

START RequestId: 4e90b424-95d9-4453-a2f4-8f5259f5f263 Version: $LATEST
('www.devsecops.cc', [], [' 138.68.149.181' ])
END RequestId: 4e90b424-95d9-4453-a2f4-8f5259f5f263
REPORT RequestId: 4e90b424-95d9-4453-a2f4-8f5259f5f263     Duration: 70.72 ms     Billed Duration: 100 ms     Memory Size: 128 MB     Max Memory Used: 55 MB     Init Duration: 129.20 ms

More or Less

For the second Lambda task, you'll use a more sophisticated script that will allow you to monitor a website. The script for the Lambda function, with the kind permission of the people behind the base2Services GitHub page [4], will attempt to perform a two-way remote TCP port connection. Copy the handler.py script (Listing 2) and paste it into the function tab, as before. If you can't copy the Python script easily, then click the Raw option on the right side of the page and copy all of the raw text.

Listing 2: handler.py

001 import json
002 import os
003 import boto3
004 from time import perf_counter as pc
005 import socket
006
007 class Config:
008     """Lambda function runtime configuration"""
009
010     HOSTNAME = 'HOSTNAME'
011     PORT = 'PORT'
012     TIMEOUT = 'TIMEOUT'
013     REPORT_AS_CW_METRICS = 'REPORT_AS_CW_METRICS'
014     CW_METRICS_NAMESPACE = 'CW_METRICS_NAMESPACE'
015
016     def __init__(self, event):
017         self.event = event
018         self.defaults = {
019             self.HOSTNAME: 'google.com.au',
020             self.PORT: 443,
021             self.TIMEOUT: 120,
022             self.REPORT_AS_CW_METRICS: '1',
023             self.CW_METRICS_NAMESPACE: 'TcpPortCheck',
024         }
025
026     def __get_property(self, property_name):
027         if property_name in self.event:
028             return self.event[property_name]
029         if property_name in os.environ:
030             return os.environ[property_name]
031         if property_name in self.defaults:
032             return self.defaults[property_name]
033         return None
034
035     @property
036     def hostname(self):
037         return self.__get_property(self.HOSTNAME)
038
039     @property
040     def port(self):
041         return self.__get_property(self.PORT)
042
043     @property
044     def timeout(self):
045         return self.__get_property(self.TIMEOUT)
046
047     @property
048     def reportbody(self):
049         return self.__get_property(self.REPORT_RESPONSE_BODY)
050
051     @property
052     def cwoptions(self):
053         return {
054             'enabled': self.__get_property(self.REPORT_AS_CW_METRICS),
055             'namespace': self.__get_property(self.CW_METRICS_NAMESPACE),
056         }
057
058 class PortCheck:
059     """Execution of HTTP(s) request"""
060
061     def __init__(self, config):
062         self.config = config
063
064     def execute(self):
065         sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
066         sock.settimeout(int(self.config.timeout))
067         try:
068             # start the stopwatch
069             t0 = pc()
070
071             connect_result = sock.connect_ex((self.config.hostname, int(self.config.port)))
072             if connect_result == 0:
073                 available = '1'
074             else:
075                 available = '0'
076
077             # stop the stopwatch
078             t1 = pc()
079
080             result = {
081                 'TimeTaken': int((t1 - t0) * 1000),
082                 'Available': available
083             }
084             print(f"Socket connect result: {connect_result}")
085             # return structure with data
086             return result
087         except Exception as e:
088             print(f"Failed to connect to {self.config.hostname}:{self.config.port}\n{e}")
089             return {'Available': 0, 'Reason': str(e)}
090
091 class ResultReporter:
092     """Reporting results to CloudWatch"""
093
094     def __init__(self, config):
095         self.config = config
096         self.options = config.cwoptions
097
098     def report(self, result):
099         if self.options['enabled'] == '1':
100             try:
101                 endpoint = f"{self.config.hostname}:{self.config.port}"
102                 cloudwatch = boto3.client('cloudwatch')
103                 metric_data = [{
104                     'MetricName': 'Available',
105                     'Dimensions': [
106                         {'Name': 'Endpoint', 'Value': endpoint}
107                     ],
108                     'Unit': 'None',
109                     'Value': int(result['Available'])
110                 }]
111                 if result['Available'] == '1':
112                     metric_data.append({
113                         'MetricName': 'TimeTaken',
114                         'Dimensions': [
115                             {'Name': 'Endpoint', 'Value': endpoint}
116                         ],
117                         'Unit': 'Milliseconds',
118                         'Value': int(result['TimeTaken'])
119                     })
120
121                 result = cloudwatch.put_metric_data(
122                     MetricData=metric_data,
123                     Namespace=self.config.cwoptions['namespace']
124                 )
125
126                 print(f"Sent data to CloudWatch requestId=:{result['ResponseMetadata']['RequestId']}")
127             except Exception as e:
128                 print(f"Failed to publish metrics to CloudWatch:{e}")
129
130 def port_check(event, context):
131     """Lambda function handler"""
132
133     config = Config(event)
134     port_check = PortCheck(config)
135
136     result = port_check.execute()
137
138     # report results
139     ResultReporter(config).report(result)
140
141     result_json = json.dumps(result, indent=4)
142     # log results
143     print(f"Result of checking  {config.hostname}:{config.port}\n{result_json}")
144
145     # return to caller
146     return result

Now, click Save at the top right and look for the Handler input box on the right-hand side of where you pasted the code. You'll need to change the starting point for the Lambda function from lambda_function.lambda_handler to lambda_function.port_check, which is how the script is written. Be sure to click Save again.

Next, configure a new test event,

{
  "HOSTNAME":"www.devsecops.cc",
  "PORT":"443",
  "TIMEOUT":5
}

adjusted a bit from the base2Services GitHub example [5]. Once you've adapted the parameters for your own system settings, go back to the EmptyTest box, pull down the menu, and click Configure test events to create a new parameter to pass to a test. I pasted the test event code over the top of the example JSON code, named it PortTest, and clicked Create.

More Haste, Less Speed

Now you can click the Test button to see if you can connect to the Internet over TCP port 443. Success is denoted this time if the output in the green bar at the top of the page shows:

{
  "TimeTaken": 187,
  "Available": "1"
}

To make sure it's working, alter your test event to a funny port number on which your destination definitely isn't listening (e.g., TCP port 4444) and see what happens. If you get a 0 for Available, you know the test is working as hoped.

Incidentally, you can ignore the CloudWatch errors if you notice them. In Listing 3 you can see the CloudWatch IAM policy auto-generated when you create the Lambda function. By default, it's trimmed down and will cause a relatively trivial CloudWatch metrics error, because it doesn't have a cloudwatch:PutMetricData permission, which the script would need.

Listing 3: CloudWatch IAM Policy

01 {
02   "Version": "2012-10-17",
03   "Statement": [
04     {
05       "Effect": "Allow",
06       "Action": "logs:CreateLogGroup",
07       "Resource": "arn:aws:logs:eu-west-1:XXXXXXX:*"
08     },
09     {
10       "Effect": "Allow",
11       "Action": [
12         "logs:CreateLogStream",
13         "logs:PutLogEvents"
14       ],
15       "Resource": [
16         "arn:aws:logs:eu-west-1:XXXXXX:log-group:/aws/lambda/internet-access-function:*"
17       ]
18     }
19   ]
20 }

Completely Hopeless

Now that your monitoring Lambda function is working, you can schedule it to run periodically to monitor a website by using CloudWatch in AWS.

In the CloudWatch section in the AWS Management Console, start with Events | Rules and choose the Schedule radio button (Figure 4). In the Targets section you want to select Lambda function in the drop-down and then select the name of your function (i.e., internet-access-function).

Figure 4: Setting up a schedule for a Lambda function in CloudWatch.

Next, click the blue Configure details button, add a name for the rule, and then click the blue Create rule button. Make sure the name doesn't contain spaces. To continue, click on Logs on the left-hand side; then, choose your Lambda function name, which in turn will reveal the log for each execution.

The top log entry offers some bad news (Figure 5). As you can see, the Lambda function's script defaults to Google in Australia (where the authors of the script reside), so you need to add your test event parameters into the CloudWatch rule. If the PutMetrics error is jumping out at you, then you can either adjust your IAM permissions, remove it from the Lambda function's script, or, of course, just ignore it.

Figure 5: To monitor the desired website, you need to adjust the input parameters of your CloudWatch rule.

Fear not, however. If you go back into the configuration, you can adjust the run-time parameters of the CloudWatch rule with relative ease. To do so, select your Lambda function and copy the PortCheck test event you created as JSON earlier and simply add this to your rule.

Where do you paste it, you may well ask? Look inside your CloudWatch rule config and tick Constant (JSON text) under the Configure input drop-down options and then paste in the content used previously:

{
"HOSTNAME":"www.devsecops.cc",
"PORT":"443",
"TIMEOUT":5
}

Having saved that change, you can now see in your CloudWatch log (Figure 6) that the Lambda function is indeed checking the correct website and logging its output for future reference.

Figure 6: Happiness is probing the correct website address.

Now that you can see the intended website, you can alter your rule's schedule to monitor its uptime every minute or every day – or, in fact, whatever time period you desire. You can even use a cron format, if you prefer.

If you want to go a step further, you can also create metrics for your CloudWatch rule and create a Simple Notification Service (SNS) topic so that email alarms are triggered when the website is unavailable. That part of the jigsaw puzzle is relatively easy to pick up if you haven't done it before. Remember to disable the CloudWatch rule once you've finished testing to avoid the potential of an email storm.

Now that you have a shiny new working Lambda function that can be scheduled to run whenever you like, I'll spend a moment looking at what a more complex workflow might look like if you were running your Lambda function inside a VPC.

Don't Be Careless

At the beginning of this article, I mentioned that Internet access is trickier if you have a more mature infrastructure and host your Lambda functions inside a VPC so that they can access nonpublic resources securely, as well as the Internet. Table 1 shows the workflow involved.

Tabelle 1: Workflow for VPCs

Step	Action Required
1	Check your VPC configuration and create a new one if needed.
2	Create a private subnet specifically for your Lambda function, so you can isolate your other services from potential security risks.
3	Create a public subnet in your VPC if one doesn't exist.
4	Ensure an Internet gateway is present in the public subnet, and adjust your routing table for outbound traffic to point at 0.0.0.0/0.
5	Point your private subnet's NAT gateway at the public subnet and point all traffic (0.0.0.0/0) to the NAT gateway.
6	Create or adjust a security group for your network rules, "self referencing" the security group to itself in a rule, if needed by your Lambda function.
7	Configure your Lambda function to use the correct VPC, subnet(s), and security group.
8	Add the suitable IAM permissions to your Lambda functions so that it can access the resources of your VPC. Make sure these permissions are available to your IAM role: ec2:CreateNetworkInterface ec2:DescribeNetworkInterfaces ec2:DeleteNetworkInterface ec2:DescribeSecurityGroups ec2:DescribeSubnets ec2:DescribeVpc

A minor caveat is that if you're testing against existing networking that is already running important services, it's possible to tie yourself in knots and break things horribly.

To get started, try to create, where possible, these new resources inside a new VPC for testing purposes. Some of the resources should definitely be deleted afterward – especially the Elastic Network Interface (ENI) – to save ongoing costs for Elastic IP addresses. Consider yourself suitably warned!

If you are familiar with the innards of AWS and have looked through Table 1, I could be forgiven for summarizing it in one sentence: "To access resources inside a VPC, use a private subnet and a NAT gateway and then connect that to a public subnet, which by inference has an Internet gateway attached for external Internet access."

I've had success with the above approach, so bear this workflow in mind for future reference if you foresee a need.

Endless

No doubt you'll be using serverless technologies more and more in the future. However, a few gotchas that introduce security risks still need some attention. Sadly, they don't magically disappear when using an abstracted platform, as some would hope. That said, I hope you can see the benefits of such abstraction, in terms of operational overhead and running costs. It's safe to say that with some basic scripting skills, serverless technology makes light work of numerous tasks.