We worked on Thursday with a team of 11 developers working remotely. Throughout this time, we were building at least 4 to 5 features simultaneously. For UX and QA reviews, we only had one staging environment and a local development environment for each developer.
When working with agile methodology, the conventional staging and local development environment approach presented us with the following challenges that were slowing down the overall pace of the team.
Limitations of using local development environment:
- Asynchronous UX Review: The product and design teams could not review the frontend work asynchronously. This often caused delays in the feedback loop.
- Maintaining Production-Quality Local Database: It was challenging to replicate and maintain a production-quality database locally, potentially leading to discrepancies and issues during testing.
- Testing with Multiple Pods/Servers: The ability to test changes across multiple pods or servers was limited.
- Facilitating Feature Testing by Stakeholders: Providing access to other stakeholders for testing the feature was impossible.
Limitations of using a single staging environment:
- Multiple Features Simultaneously: When multiple features are being tested simultaneously in a single staging environment, it can lead to conflicts and challenges in isolating issues specific to each feature.
- Unstable Code and Database: If unstable code is introduced during testing, it can potentially corrupt the database, causing data inconsistencies and impacting the overall testing process.
- Early UX Feedback on Partially Complete Code: Obtaining early UX feedback on partially complete code in the staging environment can negatively affect the stability of the environment, making it harder to ensure reliable and accurate testing outcomes.
To address these challenges, we initiated an exploration of allowing developers to deploy any individual git branch on a completely new environment. We call them "Preview Environments".
This was a task we took on during an internal hackathon conducted during one of our in-person meetups.
Thursday is currently operating on a standard setup that comprises the following components:
- Kubernetes (EKS) is utilized to run all the services. Auto-scaling is efficiently managed through the Horizontal Pod Scaler, which automatically adjusts the number of running pods based on the workload.
- AWS Aurora stores and manages the data for the application.
- Elasticache (Redis) is employed for caching purposes, enhancing the performance and responsiveness of the application by reducing the need to fetch data from the database repeatedly.
- AWS MQ (RabbitMQ) is responsible for managing the publish/subscribe pattern, enabling efficient communication between various components of the application.
In addition to the mentioned components, we are utilizing GitHub Actions and Helm to facilitate the deployment of our services to the Kubernetes Cluster.
Our Design Strategy
To avoid high costs, we chose not to create multiple replicas of our staging environment. Instead, we started by identifying reusable components.
We decided to reuse the Kubernetes cluster and create all preview environments as pods within a separate namespace. Leveraging the existing helm chart, we could easily generate a new release for each new branch.
We decided to reuse Redis and RabbitMQ as well. For Redis, we were already adding a unique environment key as a prefix to all the saved keys which helped us avoid any cross-environment conflicts. Additionally, for RabbitMQ, we used new virtual hosts, creating dedicated spaces for each environment.
Finally, for the database, we decided to write an AWS CDK script to launch a new RDS for every environment using a recent backup.
Whenever code is pushed to a Git branch with a name that matches the pattern "preview-*" a pipeline is automatically triggered. This pipeline carries out the following steps in sequence:
- Log in to AWS and get ECR credentials
- Build a docker image of the current branch
- Push the image to ECR
- Create a new RDS using a snapshot
- Update environment variables
- Create a helm release with the latest image and environment variables
Helm release contains ingress, deployments, service, and pod auto scaler. We are using the following services, which help us automate the end-to-end creation of these environments:
- Karpenter: Auto scales nodes if we do not have adequate capacity
- Cert-manager: Generating SSL certificates for the domain name
- External DNS: Automatically mapping DNS.
The Benefits of Preview Environments
After introducing the solution we saw many benefits while keeping our infra cost to the minimum. Here are a few of the benefits:
- Allowed early feedback from various stakeholders
- Allowed early testing in an environment that closely resembles the production environment
- Helped identify and resolve potential issues early in the development process
- The staging environment became more stable
Preview environments have been a game-changer for us. Please feel free to connect with us to orchestrate the same for your product.