I’ve put together my notes from the AWS webinar, which can be seen here.
Lambda is the key enabler and a core AWS component for serverless computing. Lets you run the code you want, without worrying about the underlying infrastructure and provisioning. It is also cost efficient, as there are no instances that are in running state but idle. Lambda handles scaling up and scaling down as needed, transparently to the customer.
Event Source - Changes in state of the resources or data, or any events triggered explicitly or implicitly. A large number of AWS services can act as Event Sources - Like S3 (Object PUT, DELETE..), DynamoDB, IoT, CloudFormation, CloudWatch, SNS, API Gateway and Cron schedule to name a few.
Function - The piece of code that would be run when that event occurs. The function can access any services downstream if needed. Currently Supported languages are Node.js, Python, Java 8 and C#.
- Stateless, event-based file/data processing.
- On demand execution of some logic where an event is generated via an explicit action.
- Custom Workflows for state changes within AWS.
- Code can be developed in the 4 supported languages (currently).
- The code should have no affinity or access to the underlying infrastructure, and should use external infrastructure for state (like a cache or a database). The code should be entirely stateless.
- The memory usage allowed is from 128MB to 1.5GB in increments of 64MB. CPU and network are allocated proportionately.
- Lambda can be invoked synchronously (the client waits for a response from the function), and asynchronously (fire and forget, a 202 Accepted).
What is in it
Handler()function - Entry point for Lambda.
EventObject - The request (representation of the event). Can be a stream or an simple data type.
ContextObject - Provides a handle to the environment and basic utilities like time, logging and client/event info.
VpcConfig- Useful if you want to access your private resources within your VPC. Enables the lambda environment to be able to talk to your VPC.
DeadLetterConfig- Failed events are sent to SQS or SNS. A dead letter queue or an SNS topic can be configured where the event is sent back in case the lambda function fails to process it. It is only available for async requests.
Environment- Key value pairs as a part of the configuration, which are then available as environment variables to the function. They’re encrypted at rest via KMS, and are decrypted during lambda initialization.
- 512 MB of temporary storage (ephemeral).
- 1024 File Descriptors
- 1024 processes + threads
- 300 seconds maximum execution duration
- 6MB Request payload
- 6MB Response
- 50MB Deployment package (compressed)
- 250MB Uncompressed
- 4KB Environment variables
- 100 Concurrent Executions (can be increased by requesting Amazon)
- 75GB total size of all lambdas across one region.
- Each execution happens in a container that is created and managed by AWS. Containers are re-used. This re-use can let us share cached static data, or DB connections between invocations. The folder
/tmpis available across invocations. However, do not rely on this, use it opportunistically.
- Bootstrapping happens when a function is invoked for the first time after creation or update. The start time can be reduced by allowing more memory, having a smaller deployment package. Clearly Python and Node.js will be much quicker to bootstrap than Java and C#. Strip down the deployment of all the libraries that are not needed, as lambda will load everything that is present in the package. Mode to an interpreted language if cold starts are not within acceptable limits.
- Containers run Amazon Linux AMI. If there are any native binaries, compile against this environment.
- OpenJDK 1.8 is already provided by the container, so does not need to be packaged. Similarly AWS SDK for other languages, etc.
- A zip file or a jar file containing the code and dependencies. Use Maven or Eclipse to make a package.
- Use AWS CodeCommit, CodeBuild, CodePipeline
- Use Github, Jenkins, CodeShip
- Put the package on an S3 location
SAM- Serverless Application Model
- CloudFormation supports
- A version is an immutable copy of code+config
- Each version gets it’s own ARN
- Versions can be aliased (like 13=Dev, 11=Prod, 12=Beta)
- The aliasing enables rollbacks or staged promotions.
Security and Scaling
- The Push model, where the upstream service invokes the function. The resource level policies are applied to the lambda function which control which resources can invoke it (Like allow S3 to invoke this function). The policies are already created when the event source is attached via the console.
- The pull model is used for stream event sources (Kinesis and DynbamoDB). Here lambda polls instead of the resources pushing. An IAM policy is created for the Lambda to be able to poll the source (instead of the other way around for Push).
Concurrency and Scaling
- Requests per second * function execution duration
- When throttled, the async events are retried automatically for 6 hours, with delays between retries.
- For sync requests, the client gets a 429.
- Use async model if bursts are expected as retries are handled automatically when throttled.
- Request AWS to increase the concurrency limit if consistently getting throttled.
- Ensure the downstream also keeps up with Lambda throttling, for example, if each lambda creates 10 MySQL connections, 100 concurrent invocations is not something MySQL will be able to handle.
Debugging and Operations
- 4xx errors can be invalid parameter value (400), resource not found (404), request too large (413) and can be fixed by the developer.
- 5xx need to be fixed by the admin
- Stream based events are retried till the data expires
- Async invocations are retried 2 extra times before sent to dead letter queue
- For sync invocations, the caller will need to implement the retry logic when they get an error from the call
Tracing, Tracking and Logging
- AWS X-Ray (Preview) for detailed breakdown of lambda performance and behavior metrics.
- All calls made to lambda are logged to CloudTrail. They can be delivered to S3 and analyzed.
- Every start, end, and report goes to CloudWatch
- Java User logs can be created via
- Default metrics are free - which include duration, throttles, errors, etc. Custom metrics can also be added by the function itself via the CloudWatch API.