In 2007 I was driving my son Kyle and a few of his High School friends to Knott’s Berry Farm, and his friend Hank started telling us a story about Napoleon. I suggested to Hank that he had the wrong audience here, and that his passion for European History would make a terrific podcast. He took my advice, got together with his Advanced Placement (AP) European History teacher and recorded 20 episodes that follow the standardized curriculum in the official AP Euro history book.
I created the podcast and a WordPress site entitled Hank’s History Hour at http://hankshistoryhour.com. Since 2007 this podcast has been of enormous help as a study aid for AP Euro students both in the US and in American schools abroad. It’s a funny little site/podcast because it doesn’t have an ever increasing listenership, it has an entirely new audience once a year and only in the month of May right before the AP test.
Hank’s History Hour doesn’t require any maintenance or upkeep because nothing changes other than the wonderful set of new comments that come in each year; kids praising Hank for his awesomeness in helping them pass the test. He gets a lot of marriage proposals too.
I say that it doesn’t require maintenance, and you’ll probably have realized the folly of that statement. According to recent web analytics, WordPress is now powering 23-25% of the websites in the world. And what have we learned happens when something gets super popular? People try to hack it. And hack WordPress they do.And hack Hank’s History Hour they did. You may have heard Bart mention, on very rare occasions, that the best way you can keep yourself safe and secure is to run updates. I go into podfeet.com constantly and I’m always on the lookout for those little red numbers that remind me to update WordPress and my plugins.
But I totally forgot to pay attention to Hank’s History Hour. Since WordPress 4.3, it updates itself which is marvelous, but I fell asleep at the wheel when Hank’s site was at 4.2. It gets worse though, it wasn’t just WordPress that got hacked. Since this is a low traffic website, we put it on a shared hosting plan. The hosting plan I bought doesn’t have secure FTP access, only regular old FTP, and it looks like we had an FTP hack as well. Bart says this one wasn’t my fault though because when you’re on shared hosting, if anyone’s site gets hacked, the bad guys can just slide sideways over to your site and mess it up.
I appreciated Bart throwing me a bone here, but when I looked back at my FTP password, it wasn’t up to today’s standards. It had upper/lower case letters, a number and a special character but it wasn’t very long and it wasn’t random at all. For all I know I’m the one who let the bad guys in and infected everyone else on the server. I hung my head in shame.
By the time I took a look at the files using my FTP client, Transmit, there were thousands of weird files on the server. I took a screenshot where you can see around 70 files on screen in a grid view, and I have an arrow pointing to the scroll bar so you can see how teeny it is. There may be tens of thousands of icky files in there for all I know. And this image is only from the root folder, before digging any deeper.
At this point this has been a story of woe and a tragedy, but we’re going to take a turn now. This will become a tale of adventure and discovery and ultimately victory.
I consulted with Bart and we found a good path out of this mess, but it wasn’t as straightforward as you might hope. Many of you are yelling into your devices, “Allison, PLEASE tell us you ran backups?” Well of course I had backups. And they were all corrupted. We didn’t know that at first but it was interesting and fun how we figured that out too.
The first thing I did was to install MAMP on my computer. I’ve talked about it before and we’re actually using it in Bart’s Programming By Stealth series on Chit Chat Across the Pond, but let’s back up and review to make sure we’re all on the same page. To run WordPress, you need four things:
- An operating system
- A webserver (which is also software)
- A database
- A programming language
When you pay for a hosting service, you normally get that stack in the form called LAMP, which is Linux, Apache, MySQL, and php, where Linux is the OS, Apache is the web server, MySQL is the database, and php is the programming language. You can also rent a server that uses WAMP, where Windows is the OS. If you want to run WordPress on your Mac, you run MAMP where Mac OS X is your operating system.
This sounds all scary and complicated but it’s actually pretty easy to install and use. You can download MAMP for free from mamp.info, run the installer, and then push a button to launch the server, and youre in business. Download and install WordPress and you’re ready to go.
Now let’s talk about those backups. There are two kinds of backups for a website like this. There’s the backup of the database which is the words you’ve written and the settings you’ve changed to the themes, and users you’ve added and the comments you’ve gotten. There’s also the backup of the files, which includes the WordPress files themselves (which are easily reproducible) but also the theme files and anything you’ve uploaded like images, videos, and in Hank’s case the audio files for the podcast.
Unfortunately, all of my backups of Hank’s site were current, which meant that they were filled with spam and garbage to put it politely. We imported my backup of the database into the shiny new local WordPress installation on MAMP and it was a mess. To even find every piece of ick in the database would be a nightmare.
That’s when we discovered that you can export XML from WordPress, which is a very human readable text file format. The XML contains all of the truly essential content, like the blog posts and the comments. I shudder to think how giant that file would be for podfeet.com with 953 blog posts and 3049 comments spanning over a decade, but in Hank’s case this file was “only” 7000 or so lines long. If I could scrape out all of the bad stuff from this file, we could then import it into the new clean WordPress and it would create a new database for us.
I decided that my penance for falling asleep on Hank’s History Hour was that I would read and edit…every single line of that 7000 line XML file. I know you’ll find this crazy but I actually enjoyed it. Bart told us recently on Chit Chat Across the Pond about a beautiful, open source text editor called Atom from atom.io. When I opened this spam-filled XML file in Atom I was easily able to see the standard RSS elements like “title” and “description” and “date” in red, and the text in bright green. This made it much easier to find the crud.
I started to recognize a pattern to how the bad guys had injected into the data. Every blog post of Hank’s has a paragraph of text, an empty line, followed by a line for the audio file location and a subscribe in iTunes link. On every single post of Hank’s in the polluted XML file was a section that included the script tag in html and I knew I hadn’t done that on purpose. There were a couple of other pieces of glop that were repeated in every post, like links to other websites but they were super easy to see.
I said I had fun doing this and even though I found 44 pieces of glop code in the file, I also got to read all of the amazing comments about how Hank had helped these kids survive the AP test. Seeing the thanks in the way only teenagers talk, “I love you man!” and “I want to marry you Hank!” were a real joy to see while I was cleaning.
Once I got all this done, I pulled the XML back into WordPress and it worked. WordPress makes websites look unique through the use of themes. With a theme you can decide if you want sidebars, what color the text is for a link, whether you have a banner on top, basically everything other than the real content itself is managed by the themes. Turns out though that the theme I had used for Hank’s History Hour back in 2007 was no longer available and it was far too dangerous to just download it from the cesspool I had online.
Bart has been using a WordPress theme called Customizr for lets-talk.ie that he suggested I might like because it has a pretty nice GUI to modify the look and feel of the site. I am pretty darn proud of myself because almost without any assistance at all, I was able to rebuild Hank’s History Hour into a site that is familiar to the original in look and feel but is now modern and beautiful.
Speaking of modern, I had a revelation. If you’re building a website today you would of course want a theme that took into account the fact that a huge percentage of your traffic will come from mobile. When I built Hank’s History Hour, it was the same year the iPhone was introduced! Needless to say, Hank’s History Hour was not responsive to the very audience who would be listening to the show! Now it’s what they call a “responsive design” so it mutates to look as good as possible on every size screen.
You’ll notice one piece of the story is missing and that’s how I got rid of the old site. Stay tuned for another story I am entitling, “This is Not Your Father’s GoDaddy”.
I’m thrilled that we were able to recover Hank’s History Hour and even more excited that the site is cooler than it ever was before. Hank’s happy, his parents who sponsor this are happy and Bart is happy that I learned so much! Go check it out at hankshistoryhour.com