Debugging a slow multisite WordPress install on GCP

I ran into a pesky issue with my multisite WordPress install. It was slowing to a crawl every several days until I would force a restart. Here’s how I addressed the problem. This post is relevant to anyone with either a WordPress or WordPress multisite install, particularly those with template instances packaged by Bitnami.

Please check your system to make sure folder references for the commands below are correct. If you have any trouble running the commands, try run them from root using sudo.

Command lines for inspecting a slow GCP instance

First step is to inspect your instance and to look around the logs, memory use, storage capacity, and active database processes. This will give some guidance on what to do next.

Are you running out of disk space?

Run the following at the command prompt of your instance.

$ df -h

This will return a list of filesystems with available storage that looks like this:

Filesystem      Size  Used Avail Use% Mounted on
udev            1.8G     0  1.8G   0% /dev
tmpfs           370M   15M  356M   4% /run
/dev/sda1        30G  7.1G   21G  26% /
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/sda15      124M  5.9M  118M   5% /boot/efi
tmpfs           370M     0  370M   0% /run/user/1003

If you’re using GCP, /dev/sda1 is where all your data resides. If it is over 90% you’re running out of disk space. To see which folders account for a lot of the disk space consumption, run the following to see the top 20 folders by size:

$ du -a / | sort -n -r | head -n 20

If nothing looks out of the ordinary, the simple solution is to resize your disk in GCP.

If you’re receiving an error message in WordPress that suggests you are running out of space, but the filesystem report above shows that you have plenty of space, check to make sure you’re not running out of inodes (index nodes).

$ df -i
Filesystem      Inodes  IUsed   IFree IUse% Mounted on
udev            470960    293  470667    1% /dev
tmpfs           473285    382  472903    1% /run
/dev/sda1      1957888 114846 1843042    6% /
tmpfs           473285      1  473284    1% /dev/shm
tmpfs           473285      3  473282    1% /run/lock
tmpfs           473285     17  473268    1% /sys/fs/cgroup
/dev/sda15           0      0       0     - /boot/efi
tmpfs           473285     10  473275    1% /run/user/1003

Inodes are used by linux to organize file storage on your disk. Every file on your disk represents one inode. You can have about 4 billion files on a filesystem. You shouldn’t have nearly that many files, but if you’re running out of inodes you may have an issue with some process on your system, for example a log file rotator may be exponentially creating copies of backups of files everyday.

To search for inode intensive folders on your system run:

$ find / -xdev -printf '%h\n' | sort | uniq -c | sort -k 1 -n

Did you find a folder of log files or diff histories that you’d like to delete? If you’d like to delete files older than a certain number of days, use the following command replacing 365 for the last number of days of files you would like to keep in the folder.

$ find /inode/intensive/folder/* -mtime +365 -execdir rm — '{}' \;

Are your access logs showing a lot of unexpected activity for WordPress?

First, I would look at who is accessing your instance. The following command line will return an ascending order list of IP addresses and how many log entries there are for them. If the following doesn’t return any results, check to make sure you enter the correct location of your apache logs.

$ grep -o "[0-9]+.[0-9]+.[0-9]+.[0-9]+" /opt/bitnami/apache2/logs/access_log | sort | uniq -c | sort -n

Do you notice an IP addresses with an unusually large number of entries in your access logs? Run the following command to see what they’re up to – replacing the IP address in this example with the IP address on your list.

$ grep "188.166.217.63" /opt/bitnami/apache2/logs/access_log 

In my case, the above IP address is making a POST request to the same file several times a second. Not normal. Out of curiosity, I’ll use an IP lookup tool to see which ISP the IP belongs to and to confirm it isn’t coming from me or another system under my management.

Next I would look at what files are getting accessed the most on my instance. Run the following to get a summary report of files accessed sorted by the least to most accessed files.

$ awk {'print $7'} /opt/bitnami/apache2/logs/access_log | sort | uniq -c | sort -n | awk 'int($1)>=5'

The result will look something like this:

   1410 /wp-admin/admin-ajax.php?action=as_async_request_queue_runner&nonce=807e186cbb
   3170 /robots.txt
  10560 /
  22598 /wp-admin/admin-ajax.php?action=as_async_request_queue_runner&nonce=757733f968
  23733 /xmlrpc.php
  33645 /wp-admin/admin-ajax.php?action=as_async_request_queue_runner&nonce=2daa729493
 520991 /wp-admin/admin-ajax.php

While all the entries for wp-admin/admin-ajax.php seem concerning, don’t sweat it quite yet. Most of the pages you’re loading in the admin section of your site will frequently make a call to this file.

The xmlrpc.php file is used by outside applications to interact with your website. Some will say you should disable this since it is liable to abuse – a hacker might use it to brute force login. Disabling it, however, limits the functionality of WordPress. For example, you won’t be able to use the WordPress app to manage your site.

Instead of disabling xmlrpc.php, I recommend you programatically monitor and block IP addresses that abuse your site using fail2ban. More on how to install fail2ban on a Google Cloud Platform instance below.

Is your WordPress database taking a long time to process a query?

If you’re not running out of disk space or running out of inodes, let’s take a look at your WordPress database.

To see what’s going on with your WordPress MySQL database run the following at the command line. And, yes, there is no space between the -p parameter flag and your database password.

$ mysqladmin -u root -pPASSWORD proc stat

The result is a table that will look something like this:

+-----+--------------+-----------+-------------------+---------+------+-----------------+------------------+-------+
| Id  | User         | Host      | db                | Command | Time | State           | Info             | Prog. |
+-----+--------------+-----------+-------------------+---------+------+-----------------+------------------+-------+
| 1   | system user  |           |                   | Daemon  |      | InnoDB purge... |                  | 0.000 |
| 2   | system user  |           |                   | Daemon  |      | InnoDB purge... |                  | 0.000 |
| 3   | system user  |           |                   | Daemon  |      | InnoDB purge... |                  | 0.000 |
| 4   | system user  |           |                   | Daemon  |      | InnoDB purge... |                  | 0.000 |
| 5   | system user  |           |                   | Daemon  |      | InnoDB shutd... |                  | 0.000 |
| 266 | bn_wordpress | localhost | bitnami_wordpress | Sleep   | 0    |                 |                  | 0.000 |
| 267 | bn_wordpress | localhost | bitnami_wordpress | Sleep   | 0    |                 |                  | 0.000 |
| 268 | root         | localhost |                   | Query   | 0    | Init            | show processlist | 0.000 |
+-----+--------------+-----------+-------------------+---------+------+-----------------+------------------+-------+

Under the Info column you will see the queries that are currently running. Either run the command again or check the Time to the left on any row where there is a SELECT, DELETE, UPDATE, or INSERT query under the Info column. The number under the Time column indicates how long in seconds the query has run. If a query is taking a long time, we’re going to want to investigate that further.

Let’s take a look at the size of your tables. If they’re huge, you’re going to run into performance troubles. You should login to your database and check your tables. To login run the following replacing PASSWORD with your password (no space after -p) and replace bitnami_wordpress with your database name. Not sure what your database name is? Rerun that last command and grab the name from under the db column.

$ mysql -u root -pPASSWORD bitnami_wordpress

Once you’ve logged in you’re going to run the following at the > prompt to list your database tables sorted by size. Remember to replace bitnami_wordpress with the name of your database.

> SELECT TABLE_NAME, table_rows, data_length, index_length, 
round(((data_length + index_length) / 1024 / 1024),2) "Size in MB"
FROM information_schema.TABLES WHERE table_schema = "bitnami_wordpress"
ORDER BY (data_length + index_length) ASC; 

The output will look something like this (I’ve truncated it for the sake of brevity):

+-------------------------------------------------+------------+-------------+--------------+------------+
| TABLE_NAME                                      | table_rows | data_length | index_length | Size in MB |
+-------------------------------------------------+------------+-------------+--------------+------------+
| wp_8_wpforms_tasks_meta                         |          3 |       16384 |            0 |       0.02 |
| wp_5_mailpoet_forms                             |          0 |       16384 |            0 |       0.02 |
| wp_4_options                                    |        909 |     5292032 |       114688 |       5.16 |
| wp_3_options                                    |        601 |     5292032 |      1081344 |       6.08 |
| wp_3_actionscheduler_actions                    |       2432 |     9764864 |       212992 |       9.52 |
+-------------------------------------------------+------------+-------------+--------------+------------+

I’m going to discuss in some detail here how to deal with a large “action scheduler” table, but if it is another table – related to say posts or comments – you can follow the following instructions to clear out the table, but please note that you will be LOSING all the data in that table. Best to login to your WordPress Dashboard and review comments and posts to manage and delete content through the interface.

If you have a huge action scheduler-related folders (wp_actionscheduler_actions or wp_actionscheduler_logs) and you saw earlier that your database is taking a long time to run queries on these tables, then you may want to empty out those tables to restore performance.

To empty the table at the MySQL prompt run the following replacing wp_actionscheduler_actions with the table name you’ve identified above.

> DELETE FROM wp_9_actionscheduler_actions; 

It may take some time on a very large table to run the above process, so just be patient. Once it is complete, run the following to reclaim the space from the deleted rows.

> OPTIMIZE TABLE wp_9_actionscheduler_actions;

Finally, rerun the query to list the tables by size to confirm that tables you wanted emptied are back to an acceptable size.

Why are Action Scheduler tables in my WordPress database so big?

Here’s what I think is going on and how I resolved it. Understanding that Action Scheduler is a job queue manager packaged with plugins, I discovered that in a multisite context the Action Scheduler package for a plugin is generating logged errors every minute for the sites that are not using the plugin. This may particularly affect sites setup after the plugin was installed and activated network-wide.

Without determining which plugins are using Action Scheduler, I recommend deactivating all or most of your plugins at the network level (at My Sites->Network Admin->Plugins) and then reactivate each plugin for every individual site where you know you are using it and have it configured.

Alternatively, if you want to see which plugins are creating the over-logged errors, install the Action Scheduler by Automattic plugin and activate it network-wide. Once activated, navigate in the WP Dashboard to one of your newer sites to Tools->Scheduled Actions. The page may take a while to load (if it loads at all). If it does load, you’ll get a sense of which plugin is making use of Action Scheduler under the Hook column.

Install, configure, and verify fail2ban for WordPress

If you are hosting a WordPress install on a service like Google Cloud, I highly recommend installing fail2ban. If you’re using WordPress, you know it is a popular platform and hackers know that too. Like bees to flowers, they’ll send a drone of bot scripts your way to inspect for vulnerabilities even if you keep your install up to date. You’re going to be constantly subjected to bots brute forcing your setup.

Fail2ban will block IP addresses that repeatedly try to login and this will cutoff significant bot traffic including to your WordPress XML-RPC interface.

Here is how to install fail2ban on a multisite WordPress GCP install running Debian / Linux system.

$ apt update
$ apt install fail2ban
$ cp /etc/fail2ban/jail.conf /etc/fail2ban/jail.local
$ nano /etc/fail2ban/jail.local

Change the following parameters in fail2ban.local. Feel free to experiment with these values. You may want to increase the variables if you have too many users struggling to login.

# "bantime" is the number of seconds that a host is banned.
bantime  = 360m
# A host is banned if it has generated "maxretry" during the last "findtime"
# seconds.
findtime  = 10m
# "maxretry" is the number of failures before a host get banned.
maxretry = 3

Add the following to the bottom of the fail2ban file:

# WordPress jail
[wordpress]
enabled = true
port = http,https
filter = wordpress
logpath = /opt/bitnami/apache2/logs/access_log

Save and close jail.local. Next run:

$ nano /etc/fail2ban/filter.d/wordpress.conf

Add the following contents and then save and close.

[Definition]
failregex = ^<HOST> .* "POST .*wp-login.php
            ^<HOST> .* "POST .*xmlrpc.php

Finally, you will need to start or reload fail2ban to load up the new WordPress configuration we just created. Run:

$ fail2ban-client reload 

To see if your changes are now working, you can either follow along with the fail2ban log to watch bots attempt to access your site with:

$ tail -f /var/log/fail2ban.log

The -f parameter on tail will show you the last 10 lines in the log and proceed to show you any additional log entries as they are made.

Or you can run this command:

$ fail2ban-client status wordpress

It will return a status message that looks like this which includes information on the performance of your filter, how many IP addresses are currently banned, and which IP addresses are on the banned list.

Status for the jail: wordpress
|- Filter
|  |- Currently failed: 14
|  |- Total failed:     6484
|  `- File list:        /opt/bitnami/apache2/logs/access_log
`- Actions
   |- Currently banned: 19
   |- Total banned:     1110
   `- Banned IP list:   213.152.187.210 185.119.81.98 104.254.90.203 185.119.81.105 185.156.175.35 184.75.221.35 146.70.115.219 194.187.251.163 178.254.50.202 74.205.112.88 119.8.229.155 34.90.100.150 213.152.186.24 185.119.81.108 199.249.230.12 213.152.161.40 34.87.94.148 15.206.111.210 188.225.18.210