Last week, our intern Thibault shared the stage together with me on Dev.Gent with a talk on building better, more scalable applications in the cloud. There was a waiting list and we got the question if the presentation is available online. We could have published the slides, but then you miss the context.
We decided to do better and transform the presentation in a serie of 2 blogposts.
Part 1 of 2: migrate to the cloud
Let’s go through the journey from ‘it works on my pc’ to delivering a scalable and price-effective cloud solution.
When deploying a web application in the cloud, it will work, but maybe not as efficiently as you would expect. In this blog, we’ll start with a basic deployed app and explore how to systematically improve its performance while keeping costs under control.
As we dive into improving the application, we'll explore different ways to make it more efficient and responsive without increasing costs. From optimizing the underlying infrastructure to fine-tuning configurations, there are several strategies that can help boost performance. Along the way, we’ll identify common bottlenecks and experiment with techniques that can make a real difference. The goal is to show how small, thoughtful adjustments can transform a basic setup into a more scalable and cost-effective solution.
To wrap it up, we might take a look at what a more dynamic, event-driven approach could look like, potentially exploring serverless options if it makes sense for the application. By the end of the blog, you’ll see how small, incremental improvements can transform a sluggish proof-of-concept into a highly responsive and scalable application, without breaking the bank.
All tests have been done on Google Cloud, but any other cloud provider (AWS, Microsoft) would give similar results.
Scenario
A new high potential junior developer worked really hard to build an app, and the client demo was a success! So much so that a production release was decided that very same day, it’s Friday, and wouldn’t it be great if the release could still make it into the Saturday morning newsletter?
Twenty minutes after release, the first call comes in: users complain the app is unbearably slow, hosting costs rise and the server is possibly breaking apart.
Step by step, we’ll go through a number of reasons why sites often perform worse in production than during development. Some of the things we show might seem trivial, but we’ve seen every one of these issues at real clients.
Our Test Setup
For this blog we'll focus on the backend performance. Frontend performance could be a topic for another time. The application is a 2-Tier docker set-up running on a dual-core CPU virtual machine. We use a backend Flask server, a Frontend Vue server and a database running on CloudSQL.
For making the benchmarks we used a tool called Locust, which you can configure easily to simulate real-live traffic on your web app. We define different tasks that a user can perform, each given a different weight to simulate a user choosing between different functionalities of your app.
It works on my pc
When testing locally, it's easy to feel confident about your application's performance. Everything seems fast and responsive, and even your load tests pass without a hitch. But once the app hits production, especially with real users, things can quickly fall apart. Here’s why:
- 1 user vs hundreds of parallel users
- 12 cores Mac Pro for development vs a few cores in cloud for production
- Mac M3 pro - cpu benchmark: 24900
- AWS t3a medium 2cpu: 2694
- 32GB RAM vs 4 GB RAM
- small dataset for testing vs huge dataset in production
Benchmark: epic fail for cloud hosting
Local performance is not production performance. To set realistic expectations and avoid disappointment, always test under constraints that mirror real-world conditions. Cloud resources are finite, latency is real, and your users won’t care that it ran perfectly on your laptop.
As we transition to a virtual machine on the cloud, several key changes significantly impact our performance expectations. We don’t have ‘unlimited’ resources, so we need to be mindful of how we allocate compute power and memory. In our case we’re running the application on a E2-medium instance with 2vCPU’s and 8GB ram instead of our powerful development device. Our dataset when testing locally was possibly smaller and locally we have less network delays allowing reduced latency. Resulting in terrible performance in our production environment.
Optimizing the server
Sure, we can throw extra resources at it, increase memory or the amount of vCPU’s but our next step is to think, detect bottlenecks, and solve problems smartly and also cost-efficiently.
While creating, the junior developer might have used ChatGPT to aid him in his creation process. Although the AI is super fast in delivery of a result and seems very smart and confident it sometimes responds with outdated or incomplete information as we can notice in the screenshot above. We're using an old python version as well as running the code in a limited development server. On top, debugging was used in the development process, and nobody turned if off before going to production.!
One of the easiest wins when it comes to performance is making sure your server is configured for production. By default, frameworks like Flask run on a built-in development server (usually `flask run` or something similar). It's great for local testing, but it's not designed to handle real-world traffic or concurrent requests. We replaced it with Gunicorn, a production-grade WSGI server that's battle-tested and much more efficient. Gunicorn supports spawning multiple worker processes, allowing the server to handle multiple requests in parallel.
With this small adjustment which is nothing more than installing some dependencies and using a different deployment command we can instantly see improvements. Of course we could use similar improvements for other runtimes such as tomcat tuning for Java or PHP-FPM tuning for PHP.
Using the advanced monitoring tools in Google Cloud, we can easily notice that our query_3 endpoint , which is an endpoint querying the MYSQL database, is pretty slow.
Let’s have a look on how we can boost performance and fine-tune it for our current setup. One of the first steps is optimizing our code, especially by avoiding inefficient patterns like N+1 queries (image on the left), which can quickly become a bottleneck. The N+1 query problem happens when your application makes one query to fetch a list of items, and then runs an additional query for each item to fetch related data—resulting in N+1 total queries.
This can often be fixed in the code by, for example using a JOIN query instead (image on the right) which will reduce the amount of queries and CPU load on the MYSQL server significantly for the same result.
n+1 problem
optimized sql syntax with join
We should also be strategic with indexes: adding them where they help speed up read operations, but being cautious not to over-index, as that can degrade write performance.
Beyond code, we can tweak MySQL itself through configuration variables, tuning memory and caching settings to better suit our workload. And if we’re not already on MySQL 8 or later, upgrading is a smart move, join operations and overall query performance have seen significant improvements in the newer versions.
To minimize the load on our MySQL database and improve response times, we can avoid frequent SELECT queries by leveraging an in-memory caching layer like Redis. When a user visits a public dashboard page that shows semi-live data, our application first checks if the required data is already available in Redis. If it is, we can instantly fetch it from the cache and display it to the user, ensuring a fast and efficient experience. If not, the data is fetched from MySQL, which might take longer (e.g., 10 seconds), and then written to Redis with a set expiration time (e.g., 600 seconds) for future requests. To keep the cache relatively fresh and to make sure not even the first user experiences a delay, a CRON job could run every 30 seconds to prefetch and store the data proactively. This architecture significantly reduces the number of expensive queries hitting the database and ensures that users consistently get fast responses.
After this round of MYSQL improvements as well as implementing Redis on frequently requested queries or endpoints we can see impressive results. We’re even performing better than our local set-up.
Conclusion
With just a few targeted optimizations, like moving to a production-ready server setup, fine-tuning MySQL, avoiding N+1 queries and introducing Redis for caching, we’ve already seen significant performance gains. These improvements not only reduce server load and response times but also help cut down unnecessary costs. And the best part? Many of them are relatively simple to implement and don’t require any big investments.
But what if things really take off? In Part II of this blog series (coming soon!), we’ll explore how to scale the application even further, preparing for the day your app goes viral. We'll dive into autoscaling strategies to ensure your setup stays rock solid, no matter how many users come knocking.
Curious on how much room for performance improvements and cost reductions there is in your setup? Let us know, and we'll find the answer together!