I’ve been given flack in the past for my belief that an engineering degree gives you an advantage in the tech space, but I’m going to say it again. First of all, before you write your hate mail, I do believe it’s possible to gain the tools necessary to be quite good in the field without an engineering degree, and many people DO gain that ability. However, I still believe that because engineering schools pound into the students certain ways of looking at things, especially around how to do a controlled experiment, engineers get a leg up on diagnosing problems.
So why am I bringing this up again? This week Steve and I solved an interesting problem and it was fun how we solved it. A couple of years ago we switched from Time Warner cable over to Verizon FiOS. I had no problems with Time Warner, they just didn’t offer the speeds of FiOS. About six months ago I noticed a problem with our connectivity – we lose our internet connection whenever the phone rings. This wouldn’t be an issue for normal people, but if you’re on a Skype call and the phone rings, you’re gone. Not exactly what a podcaster is looking for. Luckily it never happened when I was on anyone else’s show but a few times when Bart and I have been chatting I’ve lost him for a brief while, maybe 10-15 seconds while my Internet took a coffee break. I should point out that Skype very gracefully recovers when this happens, we don’t have to reconnect because it does it automatically. Pretty impressive actually.
I called Verizon a few months ago and they asked me to check the frequency band for our wireless phones and sure enough it was 5.8GHz, which means they’re likely interrupting the 5GHz band of our wireless router. I did have to confess to them that I had circumvented their router and was using my Airport Extreme to provide wireless and wired DHCP access to my network. They didn’t freak out which was good but didn’t offer a solution other than getting rid of our phones. Just replacing them didn’t seem smart because without isolating the problem how would you be sure you were solving it?
This week Steve and I started noodling the problem. We have two base station phone units in the house, one downstairs in the kitchen that talks to the downstairs phones, and one upstairs right next to my wireless router in my office that controls the phone in Steve’s office. In addition there’s a 3rd base station unit in one of the upstairs bedrooms only servicing itself.
To do our experiments, we have to make the phone ring. This introduces another problem, we can’t use the home phones because they’re part of the problem, right? We have to use a cell phone, but they rely on our AT&T Microcell to get a good signal inside the house. That meant we had to turn off wifi on the cell phone to make the call to be sure that we had a good test.
We also needed a reliable way to tell if the network was getting interrupted. It’s hard to notice if you’re just surfing the net, or reading email. Even watching a video online isn’t a good test because they buffer up and might not indicate an interruption. Well remember last week when we talked about using screen sharing to see the Mac Mini as it runs our Drobo to Drobo backups? It turns out, if the phone rings, that screensharing connection gets dropped and we get a “connecting” message with a giant spinning wheel on screen. This also provided another piece of information. It’s not just that we’re getting disconnected from the Internet when the phone rings, our wifi is completely stopped for those few seconds, because screen sharing is on our intranet, not externally facing. Very interesting.
Now that we have a reliable way to measure whether the wifi has been stopped, we can start changing things. To do controlled experiments, you have to change one thing at a time and then retest or you’ll never know which of your changes fixed the problem.
Steve suggested that perhaps the phone base station right next to the wireless router was the problem, so our first test was to swap the base station into Steve’s office, putting the satellite phone in my office. Steve rang our home line from his cell phone while I watched my screen share to the Mac Mini, and again we were disconnected. That eliminated the proximity of the base station to the wireless router as the root cause.
Next he suggested maybe it’s the fact that we’ve even got the phones upstairs that’s the problem, or perhaps it’s the phone system itself that’s flawed in the way it’s handling the signal. We unplugged both the upstairs base station and satellite phone and rang the house again. And again my screen share was interrupted.
The good news is that meant it’s not the upstairs phone system so replacing it would have been a waste of money. Now Steve suggested we unplug the base station and units downstairs. We both sort of groaned because the base station is mounted to the wall and it’s kind of a hassle to disconnect.
Then I remembered that 3rd base station in the spare bedroom. That one was pretty old, so maybe it was created before proper shielding was implemented in phones? Plus as I told Steve, let’s test that one first because I WANT it to be the root cause since it’s kind of cruddy. Not very engineering disciplined of me but hey, some emotion gets in sometimes, right? We unplugged the 3rd base unit, rang the phone…and my screen share stayed connected! Off that phone goes to Good Will and we won’t even replace it because what self-respecting guest would need to use a land line anyway?
The bottom line is that by doing controlled experiments, we were able to determine the root cause of a network-based problem. Had we flung around unplugging random devices, not paying attention to how the cell phone was connected, or just buying new equipment, we likely never would have found the root cause. Again, I do believe non-engineering-trained people can learn to do this but it’s ingrained in our whole way of thinking over at Chez Sheridan.