For a startup, before scaling infrastructure either horizontally or vertically we need to make sure that current resources are being used properly, and there is no bottleneck in performance due to the application configuration. The primary aim of the engineering team is to ensure that minimal resources are used in the day to day running of any system designed and deployed.
We had faced a similar issue where our deployed system was being used to serve over a million users on a daily basis with spurts of users coming in a sporadic manner. This meant that only deploying multiple servers or scaling them wouldn’t be an optimal solution.
This blog post is about tuning Nginx to improve performance i.e. to increase RPS (requests/sec) of an HTTP API. Here, I have tried to talk about the optimizations that we made to the deployed system so that it can handle tens of thousands of requests per second without having to incur a huge cost overhead.
Scenario
We needed to run an HTTP API (written in Python using flask) proxy passed with Nginx, and high throughput was needed. Also, the content of API will change at the interval of one day
optimization /ɒptɪmʌɪˈzeɪʃ(ə)n/ :noun: the action of making the best or most effective use of a situation or resource.
We used the supervisor to run WSGI Server with the following configurations:
Number of Workers: CPU Count * 2 + 1
Bind to Unix socket instead of IP, it’s slightly faster.
So finally, the supervisor command looked something like this:
gunicorn api:app --workers=5 --worker-class=meinheld.gmeinheld.MeinheldWorker --bind=unix:api.sock
We tried optimizing Nginx config(s) and tested what worked best for us.
And, to benchmark the API we used wrk with the following command:
wrk -t20 -c200 -d20s http://api.endpoint/resource
Default configuration
First, we load tested the API without any changes and got the following stats:
Running 20s test @ http://api.endpoint/resource
20 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 192.48ms 274.78ms 1.97s 87.18%
Req/Sec 85.57 29.20 202.00 72.83%
33329 requests in 20.03s, 29.59MB read
Socket errors: connect 0, read 0, write 0, timeout 85
Requests/sec: 1663.71
Transfer/sec: 1.48MB
Default config update
Let’s update default config of Nginx i.e. nginx.conf
at /etc/nginx/nginx.conf
worker_processes auto;
#or should be equal to the CPU core, you can use `grep processor /proc/cpuinfo | wc -l` to find; auto does it implicitly.
worker_connections 1024;
# default is 768; find optimum value for your server by `ulimit -n`
access_log off;
# to boost I/O on HDD we can disable access logs
# this prevent nginx from logging every action in a log file named `access.log`.
keepalive_timeout 15;
# default is 65;
# server will close connection after this time (in seconds)
gzip_vary on;
gzip_proxied any;
gzip_comp_level 2;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_min_length 256;
gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
# reduces the data that needs to be sent over the network
After the changes in the Nginx configuration(s), we should run the config test
sudo nginx -t
If the test is successful we are good to restart Nginx, to see the changes
sudo service nginx restart
With this much of configuration, we load tested the API and got the following result:
Running 20s test @ http://api.endpoint/resource
20 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 145.80ms 237.97ms 1.95s 89.51%
Req/Sec 107.99 41.34 202.00 66.09%
42898 requests in 20.03s, 39.03MB read
Socket errors: connect 0, read 0, write 0, timeout 46
Non-2xx or 3xx responses: 2
Requests/sec: 2141.48
Transfer/sec: 1.95MB
These configurations reduced timeouts and increased RPS (requests per second) but not much.
Adding Nginx cache
Since, in our case, the content of the endpoint will be refreshed at the interval of one day, this looks good situation to cache the API response.
But, with caching comes its invalidation… one of the two hard things.
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
We opt the minimal solution of purging the cache directory with a cronjob, after content updated in the downstream system.
So further all the heavy lifting will be done by the Nginx, so now we gotta make sure Nginx is supercharged…
To add caching in Nginx we would need to add few directives in our Nginx app’s configuration file. Before that we need to create a directory to store the cache data:
sudo mkdir -p /data/nginx/cache
Changes in Nginx app’s configuration:
proxy_cache_path /data/nginx/cache keys_zone=my_zone:10m inactive=1d;
server {
...
location /api-endpoint/ {
proxy_cache my_zone;
proxy_cache_key "$host$request_uri$http_authorization";
proxy_cache_valid 404, 302 1m;
proxy_cache_valid 200 1d;
add_header X-Cache-Status $upstream_cache_status;
}
...
}
With this change in configuration, we load tested the API and got the following result:
Running 20s test @ http://api.endpoint/resource
20 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.88ms 5.44ms 88.91ms 81.36%
Req/Sec 1.59k 500.04 2.95k 62.50%
634405 requests in 20.06s, 589.86MB read
Requests/sec: 31624.93
Transfer/sec: 29.40MB
So we got nearly about 19x performance boost by adding caching.
Nginx cache in RAM
Let’s go one step further, currently, our cache data is being stored on the disk what if we store that data in the RAM, in our case response data is limited and doesn’t have large response size.
So first, we need to create a directory where ram cache will be mounted.
sudo mkdir -p /data/nginx/ramcache
To mount the created directory in RAM with tmpfs use following command:
sudo mount -t tmpfs -o size=256M tmpfs /data/nginx/ramcache
This mounts /data/nginx/ramcache
in RAM allocating 256mb
.
If you feel you want to unmount, simply:
sudo umount /data/nginx/ramcache
And to re-create cache directory into RAM automatically after the reboot, we need to update the /etc/fstab
. Add the following line into it:
tmpfs /data/nginx/ramcache tmpfs defaults,size=256M 0 0
Note : we also need to update the value of proxy_cache_path
with the ramcache directory (/data/nginx/ramcache
).
After updating the configuration, we load tested the API again and got the following result:
Running 20s test @ http://api.endpoint/resource
20 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.57ms 5.69ms 277.76ms 92.94%
Req/Sec 1.98k 403.94 4.55k 71.77%
789306 requests in 20.04s, 733.89MB read
Requests/sec: 39387.13
Transfer/sec: 36.62MB
Storing cache in ram gave significant improvement almost 23x.
Buffered access logging
We store the access log of proxy passed application, we can store the log first in the buffer and write to the disk only:
if the next logline does not fit into the buffer
if the buffered data is older than specified by the
flush
parameter
This will reduce the frequent writes happening with every request. To do this we simply need to add buffer
and flush
parameters with the appropriate value in access_log
directive:
location / {
...
access_log /var/log/nginx/fast_api.log combined buffer=256k flush=10s;
error_log /var/log/nginx/fast_api.err.log;
}
So as per the above config, initially access logs will be written to the buffer and will be written to the disk only when the buffer size reaches 256 KB or the buffered data become older than 10 seconds. Note: Here combined is the log_format name.
After performing load testing again, we got the following result:
Running 20s test @ http://api.endpoint/resource
20 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.21ms 3.19ms 84.83ms 83.84%
Req/Sec 2.53k 379.87 6.02k 77.05%
1009771 requests in 20.03s, 849.31MB read
Requests/sec: 50413.44
Transfer/sec: 42.40MB
This has significantly increased the RPS about 30x from the initial stage.
Conclusion
In this blog post, we discussed the process of optimizing Nginx config(s) to improve RPS. The RPS (requests/second) was increased from 1663 to ~50413 (~ 30x increase) and this provided us high throughput. Tuning default settings can improve the performance of a system, we’ll end this post with a quote;
Make It Work. Make It Right. Make It Fast.
— Kent Beck