Optimizations: Tuning Nginx for better RPS of an HTTP API

Nginx Logo

For a startup, before scaling infrastructure either horizontally or vertically we need to make sure that current resources are being used properly, and there is no bottleneck in performance due to the application configuration. The primary aim of the engineering team is to ensure that minimal resources are used in the day to day running of any system designed and deployed.

We had faced a similar issue where our deployed system was being used to serve over a million users on a daily basis with spurts of users coming in a sporadic manner. This meant that only deploying multiple servers or scaling them wouldn’t be an optimal solution.

This blog post is about tuning Nginx to improve performance i.e. to increase RPS (requests/sec) of an HTTP API. Here, I have tried to talk about the optimizations that we made to the deployed system so that it can handle tens of thousands of requests per second without having to incur a huge cost overhead.

Scenario

We needed to run an HTTP API (written in Python using flask) proxy passed with Nginx, and high throughput was needed. Also, the content of API will change at the interval of one day

optimization /ɒptɪmʌɪˈzeɪʃ(ə)n/ :noun: the action of making the best or most effective use of a situation or resource.

We used the supervisor to run WSGI Server with the following configurations:

Gunicorn with Meinheld workers
Number of Workers: CPU Count * 2 + 1
Bind to Unix socket instead of IP, it’s slightly faster.

So finally, the supervisor command looked something like this:

gunicorn api:app --workers=5 --worker-class=meinheld.gmeinheld.MeinheldWorker --bind=unix:api.sock

We tried optimizing Nginx config(s) and tested what worked best for us.

And, to benchmark the API we used wrk with the following command:

wrk -t20 -c200 -d20s http://api.endpoint/resource

Default configuration

First, we load tested the API without any changes and got the following stats:

Running 20s test @ http://api.endpoint/resource
  20 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   192.48ms  274.78ms   1.97s    87.18%
    Req/Sec    85.57     29.20   202.00     72.83%
  33329 requests in 20.03s, 29.59MB read
  Socket errors: connect 0, read 0, write 0, timeout 85
Requests/sec:   1663.71
Transfer/sec:      1.48MB

Default config update

Let’s update default config of Nginx i.e. nginx.conf at /etc/nginx/nginx.conf

worker_processes auto;
#or should be equal to the CPU core, you can use `grep processor /proc/cpuinfo | wc -l` to find; auto does it implicitly.

worker_connections 1024;
# default is 768; find optimum value for your server by `ulimit -n`

access_log off;
# to boost I/O on HDD we can disable access logs
# this prevent nginx from logging every action in a log file named `access.log`.

keepalive_timeout 15;
# default is 65;
# server will close connection after this time (in seconds)

gzip_vary on;
gzip_proxied any;
gzip_comp_level 2;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_min_length 256;
gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;
# reduces the data that needs to be sent over the network

After the changes in the Nginx configuration(s), we should run the config test

sudo nginx -t

If the test is successful we are good to restart Nginx, to see the changes

sudo service nginx restart

With this much of configuration, we load tested the API and got the following result:

Running 20s test @ http://api.endpoint/resource
  20 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   145.80ms  237.97ms   1.95s    89.51%
    Req/Sec   107.99     41.34   202.00     66.09%
  42898 requests in 20.03s, 39.03MB read
  Socket errors: connect 0, read 0, write 0, timeout 46
  Non-2xx or 3xx responses: 2
Requests/sec:   2141.48
Transfer/sec:      1.95MB

These configurations reduced timeouts and increased RPS (requests per second) but not much.

Adding Nginx cache

Since, in our case, the content of the endpoint will be refreshed at the interval of one day, this looks good situation to cache the API response.

But, with caching comes its invalidation… one of the two hard things.

There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton

We opt the minimal solution of purging the cache directory with a cronjob, after content updated in the downstream system.

So further all the heavy lifting will be done by the Nginx, so now we gotta make sure Nginx is supercharged…

To add caching in Nginx we would need to add few directives in our Nginx app’s configuration file. Before that we need to create a directory to store the cache data:

sudo mkdir -p /data/nginx/cache

Changes in Nginx app’s configuration:

proxy_cache_path /data/nginx/cache keys_zone=my_zone:10m inactive=1d;
server {
    ...
    location /api-endpoint/ {
        proxy_cache my_zone;
        proxy_cache_key "$host$request_uri$http_authorization";
        proxy_cache_valid 404, 302 1m;
        proxy_cache_valid 200 1d;
        add_header X-Cache-Status $upstream_cache_status;
    }
    ...
}

With this change in configuration, we load tested the API and got the following result:

Running 20s test @ http://api.endpoint/resource
  20 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     6.88ms    5.44ms  88.91ms   81.36%
    Req/Sec     1.59k   500.04     2.95k    62.50%
  634405 requests in 20.06s, 589.86MB read
Requests/sec:  31624.93
Transfer/sec:     29.40MB

So we got nearly about 19x performance boost by adding caching.

Nginx cache in RAM

Let’s go one step further, currently, our cache data is being stored on the disk what if we store that data in the RAM, in our case response data is limited and doesn’t have large response size.

So first, we need to create a directory where ram cache will be mounted.

sudo mkdir -p /data/nginx/ramcache

To mount the created directory in RAM with tmpfs use following command:

sudo mount -t tmpfs -o size=256M tmpfs /data/nginx/ramcache

This mounts /data/nginx/ramcache in RAM allocating 256mb.

If you feel you want to unmount, simply:

sudo umount /data/nginx/ramcache

And to re-create cache directory into RAM automatically after the reboot, we need to update the /etc/fstab. Add the following line into it:

tmpfs /data/nginx/ramcache tmpfs defaults,size=256M 0 0

Note : we also need to update the value of proxy_cache_path with the ramcache directory (/data/nginx/ramcache). After updating the configuration, we load tested the API again and got the following result:

Running 20s test @ http://api.endpoint/resource
  20 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.57ms    5.69ms 277.76ms   92.94%
    Req/Sec     1.98k   403.94     4.55k    71.77%
  789306 requests in 20.04s, 733.89MB read
Requests/sec:  39387.13
Transfer/sec:     36.62MB

Storing cache in ram gave significant improvement almost 23x.

Buffered access logging

We store the access log of proxy passed application, we can store the log first in the buffer and write to the disk only:

if the next logline does not fit into the buffer
if the buffered data is older than specified by the flush parameter

This will reduce the frequent writes happening with every request. To do this we simply need to add buffer and flush parameters with the appropriate value in access_log directive:

location / {
    ...
    access_log /var/log/nginx/fast_api.log combined buffer=256k flush=10s;
    error_log /var/log/nginx/fast_api.err.log;
}

So as per the above config, initially access logs will be written to the buffer and will be written to the disk only when the buffer size reaches 256 KB or the buffered data become older than 10 seconds. Note: Here combined is the log_format name.

After performing load testing again, we got the following result:

Running 20s test @ http://api.endpoint/resource
  20 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     4.21ms    3.19ms  84.83ms   83.84%
    Req/Sec     2.53k   379.87     6.02k    77.05%
  1009771 requests in 20.03s, 849.31MB read
Requests/sec:  50413.44
Transfer/sec:     42.40MB

This has significantly increased the RPS about 30x from the initial stage.

Conclusion

In this blog post, we discussed the process of optimizing Nginx config(s) to improve RPS. The RPS (requests/second) was increased from 1663 to 50413 (** 30x increase**) and this provided us high throughput. Tuning default settings can improve the performance of a system, we’ll end this post with a quote;

Make It Work. Make It Right. Make It Fast.
— Kent Beck

Scenario#

Default configuration#

Default config update#

Adding Nginx cache#

Nginx cache in RAM#

Buffered access logging#

Conclusion#

Resources#