Matt Borja

NGINX UDP Health Check

About

Performs UDP health check against a list of UDP services derived from NGINX stream configuration file, reloading NGINX only when necessary. This script is designed to be idempotent and preclude enabling/disabling servers that have already been effected in the stream configuration file specified.

Installation

Tip: Don't ever download and run scripts blindly or register them as crons, especially as root. Just because you can, doesn't mean you should!

Download (/usr/local/bin/udp-check)

# mkdir -p /usr/local/bin/udp-check && \

  curl -so /usr/local/bin/udp-check/udp-manage-all.sh 'https://gist.githubusercontent.com/rdev5/2f74db90b47cc43d028bb39e6f5855ab/raw/10_udp-manage-all.sh' && \
  curl -so /usr/local/bin/udp-check/udp-check.php 'https://gist.githubusercontent.com/rdev5/2f74db90b47cc43d028bb39e6f5855ab/raw/20_udp-check.php' && \

  chmod 0744 /usr/local/bin/udp-check/udp-manage-all.sh && \
  chmod 0744 /usr/local/bin/udp-check/udp-check.php

Setup Cron

Note: The UDP management script is designed to invoke itself every 10s to facilitate shorter health check intervals. See also CRON_INTERVAL_SECONDS.

# crontab -l > /tmp/.root-cron && \
  echo '* * * * * /usr/local/bin/udp-check/udp-manage-all.sh' >> /tmp/.root-cron && \
  crontab /tmp/.root-cron && \
  rm -f /tmp/.root-cron

Verify Installation

To verify proper installation, watch for bursts of ICMP traffic to/from your UDP upstream servers with tcpdump:

# /usr/sbin/tcpdump -n ip proto \icmp

Development

This section describes script usage more in depth and may be used to bootstrap your own custom UDP monitoring solution.

Testing

The following test can be handcrafted using a copy of sample-streams.conf.

$ ./udp-check.php ./sample-streams.conf && echo '!! NGINX reload required' || echo 'No reload required at this time'
PHP Warning:  Retrying for lock (attempt #1)... in ./udp-check.php on line 128
PHP Warning:  Retrying for lock (attempt #1)... in ./udp-check.php on line 128
PHP Notice:  Conducting 3 UDP checks took 1 seconds.
 in ./udp-check.php on line 297
PHP Notice:  127.0.0.2:33033 went offline in ./udp-check.php on line 184
PHP Notice:  127.0.0.3:33033 came online in ./udp-check.php on line 184
!! NGINX reload required

$ cat ./streams.conf
server {
  listen 127.0.0.3:33033 udp;
  proxy_timeout 1s;
  proxy_pass localhost-33033;
}

upstream localhost-33033 {
  server 127.0.0.1:33033;
  # server 127.0.0.2:33033;
  server 127.0.0.3:33033;
}

Once you've verified the script behaves expectedly, you can try running it against a copy of your UDP upstream configuration files taken from NGINX.

The following example will: 1. Iterate over all *.conf files in ./udp_upstreams 2. Pipe each file to udp-check.php in the background (concurrent), redirecting notices and warnings to /dev/null 3. Recommend appropriate NGINX reload action per configuration file based on any status changes detected

$ cp -r /etc/nginx/udp_upstreams .
$ find udp_upstreams/ -type f -name "*.conf" | xargs -I {} sh -c '((./udp-check.php {} &>/dev/null && echo "NGINX reload required for {}" || echo "No reload required for {}") &)'
No reload required for udp_upstreams/appClusterA-33033.conf
No reload required for udp_upstreams/appClusterB-33034.conf
NGINX reload required for udp_upstreams/appClusterC-33035.conf
No reload required for udp_upstreams/appClusterD-33036.conf

Tip: Watch the ICMP traffic via /usr/sbin/tcpdump ip proto \icmp!

Exit Codes

  • 0 = Change in server health detected; NGINX should reload
  • 1 = Invalid usage, fork failure
  • 2 = No changes require reloading at this time

Production

To deploy in production, simply replace the find target directory (i.e. /etc/nginx/udp_upstreams), wait for all responses to come in, and perform a one-time NGINX reload.

An example script (udp-manage-all.sh) has been provided to demonstrate this but requires the UPSTREAM_GLOB variable to be set appropriately.

Example:

# ((tail -f /var/log/messages | grep -i nginx) &)
# ./udp-manage-all.sh
Reload required for ./udp_upstreams/appClusterC-33035.conf
May 21 15:36:03: udp-manage-all.sh: UDP check detected change in health state for one or more servers. Reloading NGINX...
Reloading NGINX
May 21 15:36:03: Reloading The nginx HTTP and reverse proxy server.
May 21 15:36:03: Reloaded The nginx HTTP and reverse proxy server.
redirecting to systemctl reload nginx.service

Note: This script is being run as root to obtain enough privileges for reloading nginx.service.