• AWS KMS and Envelope Encryption

    Every service needs encryption at one point or another - passwords to the database, credentials to an external service, or even entire filesystem or files. Sticking the secrets, or keys in configuration files seems a quick and easy option. However, it carries security risks, even if these configurations are managed outside of the source code. On top of it, the keys used to encrypt/decrypt the data bring additional security implications and requirements in terms of storage, audit, and lifecycle management.

    AWS KMS, or AWS Key Management Service is a fully managed service to store and manage keys. Any AWS service which supports encryption - S3 buckets, EBS Volumes, SQS, etc. uses KMS under the hood. KMS is more than just a key manager, it can also be used to encrypt large volumes of data, using a technique called Envelope Encryption.

    In this post I will cover KMS, and the why, what, and how of Envelope Encryption.

    Read on →

  • Using Amazon EC2 Container Service

    Amazon ECS, or EC2 Container Service is a Container Management Service for Docker containers. Similar to Kubernetes in intent, the service allows users to provision Docker containers in a fully managed cluster of EC2s. This post is a quick summary of how to get up and running with your own ECS cluster.

    The motivation behind containers is to optimize the usage of underlying resources like CPU and Memory. Containerized infrastructure provides a dense compute environment, allowing us to pack more usage without having to spend $$ for idle/underutilized resources.

    Read on →

  • Apache Nutch - Step by Step

    Search is one of the most fantastic areas of the technology industry, and has been addressed many, many times with different algorithms, producing varying degrees of success. We get so used to it, that often times I wish I had a Cmd-F while reading a real book.

    Recently we had our Quarterly Hack Week at Marqeta, and one of the ideas was to build search around our public pages. These pages would include the public website assets, as well as the the API developer guides and documentation. This post is a quick summary of the infrastructure, setup, and gotchas of using Nutch 2.3.1 to build a site search - essentially notes from this hack week project.


    Read on →

  • Caching - Gotchas & Lessons Learned

    It has been said time and again - “There are only two hard things in Computer Science: cache invalidation and naming things”. Having run into both of these problems in my professional career, I figured I could write a post, summarizing the lessons I have learned along the way by seeing and building various caching architectures across many companies, big and small.

    Just like threading, caching is easy to code, but often creates more problems than it intends to solve. These problems can arise from - you guessed it - invalidation, sub-par efficiency, inconsistency, and many more. It is also one of my favorite topics for technical interviews :)

    Read on →

  • Boto 3 and SQS

    Boto 3 is the AWS SDK for Python. In fact, this SDK is the reason I picked up Python - so I can do stuff with AWS with a few lines of Python in a script instead of a full blown Java setup. Its fun, easy, and pretty much feels like working on a CLI with a rich programming language to back it up. In this post we will use SQS and boto 3 to perform basic operations on the service.

    Read on →