Strange Character in Text - elaborate explanation in blog post

Podfeet Speedup and Fixing MySQL Character Encoding

Strange Character in Text
Strange Character in Text

I mentioned last week that had slowed down to a crawl, but that I didn’t know why. I thought it might be interesting to understand a little bit about how the site works and what Bart and I tried to do and how William Reveal is now our hero. The good news for you is that I don’t have a particularly deep knowledge about what we did so I can’t get super nerdy on you. Some nerdy, just not super nerdy.

A Little Bit of Background, like 43% of all websites on the Internet, is run on WordPress. Every word I type, along with every comment you make on the site, are stored in a MySQL database. WordPress takes the data from the database and displays it on your web browser, and uses the settings I’ve given it to control things like how big the titles of posts are, what color links are, and whether there’s a sidebar or top menus.

WordPress and MySQL, along with the webserver software and operating system are all running on a virtual server that is currently hosted by a company called DigitalOcean.

Move the Database

When things started to slow down on, I suggested to Bart that maybe I should throw money at the problem and increase the capacity of the virtual server at DigitalOcean. I was paying for 1 virtual CPU (vCPU) with 2GB of RAM, 50GB of SSD storage, and a transfer max of 2TB per month. That was pretty inexpensive at $10/month. For $5/month I could double the number of CPUs, increase the storage to 60GB and have 3TB of data transfer.

When Bart and I talked about it, we weren’t entirely sure where the problem was coming from but he suggested another path to consider. Remember I said that the MySQL database was hosted on the same server as WordPress? He explained that I could move the database to a dedicated, managed database server. This sounded like witchcraft to me at first. I didn’t understand how this could even be possible.

In a screen sharing session, Bart taught me how this works and it’s incredibly simple. First, I bought a managed database server for $15/month from DigitalOcean. The database server has 1 vCPU, 1GB of RAM, and 10GB of disk. My database is under 500MB so that’ll do. Next, Bart explained how to tell WordPress to point at the new database server.

Buried in the giant pile of files that control WordPress is a very simple file called wp-config.php. In this file, which is literally 28 lines long including comments, you tell WordPress the location of the database. For coming up on 17 years, the location of the database has been “localhost”. Bart explained that I simply had to change localhost to the path for the database in a new location.

I couldn’t do that yet though, because I needed to actually move the database itself to the new managed database server. In MySQL, there’s a dump command that really means export. The method we planned to follow was to dump the database from the local webserver and then import it into the managed database server. And this is where things took a very surprising turn.

Encoding is Fun (Not)

Bart is cautious, which comes from years of experience being burned, so he suggested I make a copy of my wp-config file and save it with the word “orig” on the end. We made a second copy of the config file and put “new” on the end. We edited the new config file to point at the new database server instead of localhost. That allowed us to swap the two files in and out as we tested things to see how well they were working. This one step was genius because I ended up swapping back and forth from orig to new about 38 times in the last week. In other words, things didn’t go well at all.

I’m probably going to make a dog’s dinner of this next part of the explanation, but I’ve got to take a swing at it, or else I can’t tell the rest of the story. In computing, we use something called character encoding to assign numbers to graphical characters. Only through the process of character encoding can we store, transmit and transform text using digital computers (…).

For reasons that are baffling to me, there seem to be as many different flavors of character encoding as there are stars in the night sky. The one that seems to be in vogue these days is called UTF8. But there’s also Latin1. And there’s one with Swedish in it, and I remember seeing a Slovakian character encoding in my adventures with Bart.

The reason I’m bringing up character encoding is that if you have a character encoding mismatch in your database, weird characters show up on your blog. When we exported the database from my webserver and imported it into the database server, characters like apostrophe and en-dash got very weird. Apostrophe was replaced with “’” and en-dash was replaced with “–”. This told us the character encoding had was borked, but how and where did it occur?

Bart and I worked with it for a while, trying to compare what the database looked like on my webserver and what it looked like on the database server and Bart was unable to figure out how to fix the problem. Luckily, because of the caution he took upfront, I was able to quickly toggle back and forth between the two locations for the database just by switching which wp-config file I used. Bart was distressed that he hadn’t solved it on the first try, but I wasn’t terribly worried.

Then things got even slower

CPU Load drops dramatically After Power Down and Double vCPUs
CPU Load After Power Down and Double vCPUs

While time was passing until we could schedule our next play date, basically fell over in a heap. When I was working on the show notes late last week, I was getting up to 40 second page load times! You can imagine how fun that was. It’s also why I warned you that I was aware things weren’t looking good.

In the DigitalOcean web interface, I can see a few graphs, including one on CPU load. I could see that my server had been sitting at 100% for quite some time. Going back in time, I could see that this had been happening intermittently over the past few weeks.

I decided to go back to my original idea and throw money at the problem by doubling the number of virtual CPUs. This is much scarier than it sounds. I had to POWER DOWN in order to resize it. I am proud to report that I searched for how to do that, powered my own webserver down, did the resizing, and then powered it back up all by myself! My theory was that with 40 second page load times, the audience would not notice the difference.

I put a graph in the show notes that illustrates how after the power down and resizing, was no longer peaking at 100%. I was surprised though to see after a while that it would go up to 100% but it wasn’t getting stuck there.

Before we went down any of these paths, I’d worked hard to eliminate unneeded plugins and even tested turning them all off and turning half back on and measuring speed, but no single plugin appeared to be the root cause. When I was doing these tests, I installed a plugin called Query Monitor that gives me a red banner when I’m logged into my site showing me how long the page has taken to load. Gone were the 40-second waits, and it was down to more like 6-7 seconds. Still not great, but I could actually get work done.

Time to Send Up the Bat-Signal

I asked Bart what our next step would be and he said it was time to send up the Bat-Signal to the NosillaCastaways to see if anyone was an expert in MySQL and all this character encoding nonsense who would be willing to step in. He was at the end of his knowledge and couldn’t figure out what to do next.

I posted the problem and request for help into our Slack on the Programming By Stealth channel, figuring that’s where the nerdiest of nerds hang out. The most awesome William Reveal volunteered to step in to see if he could save the day.

I’m happy to report that he did indeed save the day. But it wasn’t one of those, “Hey dummy, all you need to do is…” moments. He spent two solid hours on a Zoom call with me to start. I loved every minute of it because William is everything I want in someone helping me. He let me drive, he explained what he was asking me to do very clearly, and he let me interrupt him endlessly with clarifying questions. He even listened to my suggestions and in two hours, one of my suggestions was actually helpful!

At the end of two hours, we had tested a lot of ways to try to fix the database but had not succeeded. We parted ways to give him time to noodle some more. He had me export the database from my webserver and from the new database server so that he could run experiments on his own.

At 3 in the morning that night he woke up and realized that he needed to do some other work for a client, and after he finished that he just couldn’t stand it and took to the Googles to try to find a solution. He told me that he tried a lot of methods he found on the interwebs to eliminate the weird characters on export so that they wouldn’t show up in the new database, but they kept failing. And then he found one that worked.

The basic problem with the character encoding of my database is that it’s a combination of UTF8 and Latin1, but we need it to be all UTF8. You would think you could issue a command to convert it straight to UTF8 but all methods to do that failed. The solution on which William finally settled converts the database table for the blot posts to Latin1, and then converts it to UTF8. For the nerdiest of you, I included the command in the show notes that finally worked for us.

UPDATE wp_posts SET post_content = @txt where char_length(post_content) = length(@txt := convert(binary convert(post_content using latin1) using utf8));

William also did something else brilliant here. Instead of importing the database to the new server and then changing my wp-config to point to the new server and then searching for funny characters, he had a better method. He found a post by its ID that had a funny character in it ahead of time, and then we could look at just that one post in the database to see if it had disappeared.

select post_content from wp_posts where id=8;

After dumping the database to a file, we ran the command on the file and then asked MySQL to show us the post with id=8, which happens to be my About page, and I clapped with glee to see that the crazy characters were gone!

I then mentioned that the miscreant characters were also in the titles. We searched for an id of one with the problem and ran a similar MySQL command to fix the title table.

UPDATE wp_posts SET post_title = @txt where char_length(post_title) = length(@txt := convert(binary convert(post_title using latin1) using utf8));

Again I clapped with glee to see them go away.

Then I mentioned to William that the comments are also probably riddled with the problem. Evidently, comments are in a bit of a different form than titles or posts because William had to do some jiggery-pokery to construct a somewhat different command, but in the end, he was victorious.

UPDATE wp_comments SET comment_content = @txt where char_length(comment_content) = length(@txt := convert(binary convert(comment_content using latin1) using utf8));

We uploaded the newly cleansed UTF8 version of my database to the new database server, I switched over to the new wp-config file, and is now up and running with the webserver and the database on two different servers, which I still think isi weird.

Bottom Line

CPU Load After Double vCPUs and New Database Server
CPU Load After Double vCPUs and New Database Server

Bart and I chatted after William had vanquished the evil characters, and he said that even though my doubling of vCPUs had helped a lot, he could still see the server hitting the peak for periods of time, so he thinks the database move was still the right way to go. The main thing we hope to get from this is faster performance and I sure hope that turns out to be the case.

While I’m delighted that is faster (if only for my own sanity while I’m writing), in the bigger picture the experience reinforced the fact that the NosillaCastaway community is truly a magical group of people. William Reveal stepped up to help all of us, dedicated a great deal of his own time and I now consider him a friend. The most important thing he did here was to give Bart time. Without William, Bart would still have been beating his head against this when he could, instead, be teaching us programming and how to stay safe on the Internet. Thank you, William, for giving all of us this gift.

2 thoughts on “Podfeet Speedup and Fixing MySQL Character Encoding

  1. Joseph Nilo - February 20, 2022

    I got tired of my slow WordPress site and decided to design my own site with Python / Django, also hosted on Digital Ocean. Ported all content over with minimal fuss. Now I’ve got a much zippier, JavaScript-free site with complete control.

  2. Steve Sheridan - February 25, 2022

    This is a test message to see if it is retained as Allison moves her website droplet.

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top