• SVCC 2017 - AWS Lambda with Serverless Framework and Java

    Slides from my talk at Silicon Valley Code Camp 2017.

    Serverless is a node.js based framework that makes creating, deploying, and managing serverless functions a breeze. We will use AWS Lambda as our FaaS (Function-as-a-Service) provider, although Serverless supports IBM OpenWhisk and Microsoft Azure as well.

    In this session, we will talk about Serverless Applications, and Create and deploy a java-maven based AWS Lambda API. We will also explore the command line interface to manage lambda, which is provided out of the box by serverless framework.

    Read on →

  • AWS KMS and Envelope Encryption

    Every service needs encryption at one point or another - passwords to the database, credentials to an external service, or even entire filesystem or files. Sticking the secrets, or keys in configuration files seems a quick and easy option. However, it carries security risks, even if these configurations are managed outside of the source code. On top of it, the keys used to encrypt/decrypt the data bring additional security implications and requirements in terms of storage, audit, and lifecycle management.

    AWS KMS, or AWS Key Management Service is a fully managed service to store and manage keys. Any AWS service which supports encryption - S3 buckets, EBS Volumes, SQS, etc. uses KMS under the hood. KMS is more than just a key manager, it can also be used to encrypt large volumes of data, using a technique called Envelope Encryption.

    In this post I will cover KMS, and the why, what, and how of Envelope Encryption.

    Read on →

  • Using Amazon EC2 Container Service

    Amazon ECS, or EC2 Container Service is a Container Management Service for Docker containers. Similar to Kubernetes in intent, the service allows users to provision Docker containers in a fully managed cluster of EC2s. This post is a quick summary of how to get up and running with your own ECS cluster.

    The motivation behind containers is to optimize the usage of underlying resources like CPU and Memory. Containerized infrastructure provides a dense compute environment, allowing us to pack more usage without having to spend $$ for idle/underutilized resources.

    Read on →

  • Apache Nutch - Step by Step

    Search is one of the most fantastic areas of the technology industry, and has been addressed many, many times with different algorithms, producing varying degrees of success. We get so used to it, that often times I wish I had a Cmd-F while reading a real book.

    Recently we had our Quarterly Hack Week at Marqeta, and one of the ideas was to build search around our public pages. These pages would include the public website assets, as well as the the API developer guides and documentation. This post is a quick summary of the infrastructure, setup, and gotchas of using Nutch 2.3.1 to build a site search - essentially notes from this hack week project.

    Search

    Read on →

  • Caching - Gotchas & Lessons Learned

    It has been said time and again - “There are only two hard things in Computer Science: cache invalidation and naming things”. Having run into both of these problems in my professional career, I figured I could write a post, summarizing the lessons I have learned along the way by seeing and building various caching architectures across many companies, big and small.

    Just like threading, caching is easy to code, but often creates more problems than it intends to solve. These problems can arise from - you guessed it - invalidation, sub-par efficiency, inconsistency, and many more. It is also one of my favorite topics for technical interviews :)

    Read on →