WordPress.com was down for about half an hour

Whew, my blog was down for about half an hour. Very frustrating. Demonstrates one of the weaknesses on being on a beta service (WordPress.com is still in development). Yesterday we were over at F5 meeting with the guys who make their hardware and services. They demonstrated to us how to build a service that’ll stay up no matter what happens to your servers.

Service uptimes are very important, and even more important when you are betting your business on them.

When you are looking at a blog tool for your business, you better make sure it has failover to another data center in a different city. Does anyone offer that kind of blog service for a reasonable price?


Filed under: Blog Stuff @ 5:49 am | 33 Comments

33 Comments

  1. Eric D. Burdo Says:

    Not exactly a blog services, but check out TextDrive.com

    Started by Dean Allen (the developer of TextPattern). Their redundancy is crazy, and the prices are really pretty decent.

    Matt (from WP) is on the team at TextDrive…

  2. John Says:

    Since that time the entire East coast of North America lost its electricity, maybe another continent would be a better bet. A good opportunity here for Anglo-American cooperation :-)

    John Evans

  3. Waffle Says:

    It’s darn frustrating that WordPress is doing this down thingy so often. It’s like the 2nd time this week now… Can I complain on a free service… LoL

  4. scobleizer Says:

    Waffle: well, if you can’t complain, then they shouldn’t expect businesses to build on top of their service. And, if businesses can’t build on top of their service, what’s the reason that Matt Mullenweg just quit his job to do this full time?

  5. Roger Benningfield Says:

    “When you are looking at a blog tool for your business, you better make sure it has failover to another data center in a different city.”

    Robert: Normal businesses have their primary web presences sitting on overloaded, shared servers in the corner of a datacenter somewhere. To them, thirty minutes of downtime is a really good week.

  6. Ron M. Says:

    Why would anyone want results from any other search engine? Who cares about openness and “choice” when the best search engine blows them all away.

    Yahoo and Microsoft maps just don’t match the simplicity and quality of Google’s maps. There’s a reason they are #1.

  7. scobleizer Says:

    Ron: that’s a nice story, but I don’t agree with you. Zvents won’t take Google maps off of their site even if Virtual Earth is way better. Why? Cause of the advertising hook.

  8. Joe Pruitt Says:

    Robert,

    Thanks for stopping by yesterday, we had a great time! We really believe that a new kind of network is coming where the applications being developed (whether web, xml, web services, etc) are designed to communicate with the network to facilitate their health, security, and responsiveness.

    You didn’t mention a link to our site so I’ll post it here for any readers interested:

    http://www.f5.com
    http://devcentral.f5.com

    Roger, As for 30 minutes of downtime being a really good week, I’m not sure what “normal business” you are referring to, but that could equate from thousands to millions of dollars of lost revenue. Not so much for free services, but think about a financial service or any form of eCommerce site. If their site is not running at optimal performance (sub second response) or simple down, a customer will fine somewhere else to do their business and that will not only lose them revenue in the immediate term, but also all future income from that user.

    And this doesn’t boil down to just websites, services are very important as well. As we move into SOA infrastructures and WebServices start to gain more of a presence then their availability, responsiveness, and security are critical to their success.

    Bottom line: If you are using a service, demand multi-site fault-tolerance. If you are running a service, make sure you are prepared for spikes traffic or disaster recovery plans.

    -Joe

  9. Jeremy Wright Says:

    Joe, 30 minutes of downtime in a month is well within acceptance levels for most companies. Also, fault-tolerance and failover isn’t enough to guarantee even 3 9s of uptime.

    I’m not saying you don’t know your stuff, but if Robert thinks that failover is enough to keep things up to 5 or 6 9s (there’s no such thing as a 100% system), he needs to step into the real world.

    Also, someone should explain to him how costs grow exponentially with each 9 added. If failover or fault tolerance were enough, most companies would just need 3 servers (2 paired in one DC, the backup in a second DC) and they’d “automatically” have 4-5 9s of uptime.

    Again, Joe, not that you don’t know your stuff. Looking at F5, it looks like you do. And it’s likely not your fault Robert walked away with a simplified view of uptime management (it was likely only a few hours you shared, after all).

    I just don’t like to see things so over-simplified, or to see Robert griping about a free, beta service, being down for half an hour … Especially after the outages he’s coming from, or the outages his friends are experiencing on services they PAY for, and which have been public for a year or two.

    Best of luck with F5 Joe :)

  10. scobleizer Says:

    Jeremy: I know my sharing here was simplistic. THere’s only so much you can say in a couple of hundred words. And I know it’s expensive. I paid UserLand’s bills for a while, remember (and we didn’t have failover or any of the fancy dancy things that F5 lets you do).

    But, when you build a business, you better think about these things. You might make the decision to go with a low-cost provider (like I have) but that decision may have consequences for you down the line.

  11. Jaseone Says:

    So just out of curiosity what happens when the F5 itself goes down?

    You stated yourself that WordPress.com is in beta and the software it runs on is only in alpha (if that) so why exactly are you running a high profile blog on it that you expect 100% uptime for?

  12. Jeremy Wright Says:

    Robert, glad you appreciate the complexities of managing enterprise-level solutions :)

    It’s not a simple world and the level of availability any company provides should be a strategic one (just like the level of company any customer goes with). Right now, availability likely isn’t all that important to the WordPress.com folk, since it’s in beta and all.

    It’s expensive as well. At b5media, we’re just about to go to a load-balanced solution (40% increase in costs). Next we’ll go to a failover-based one (another 60% increase in costs). Finally we’ll go to a closer-to-enterprise one (another 50% increase).

    This stuff is expensive. Sometimes you need to make a choice between your business existing and your business being alive. It’s what happens when you boot-strap.

  13. scobleizer Says:

    Jaseone: why am I using WordPress? Because they won a contest for OPML generation that I ran a few weeks ago. Also, because I want to learn about the bleeding edge of blog services so that I can talk legitimately about the usefulness of those. I also have a Typepad blog (my book blog). My son has a Google Blogger blog. My wife has an MSN Spaces blog. So, we’re covering much of the waterfront.

  14. Robert Says:

    What happened to your DABU blog? - I was hopping that someone like you would put it to the test! I know the domain name is still there, but you have not posted any content!

  15. James Fee Says:

    Robert, how has Typepad been working? I know they have some problems with growth over there, but you haven’t mentioned how that service is doing.

  16. Jeremy Wright Says:

    James, lots and lots and lots of complaints everywhere about TypePad.

  17. Luca Says:

    Hi Robert, I’m going OT … but have you a standard wordpress account?
    How can you customize the template of your wordpress blog?

  18. James Fee Says:

    Jeremy, I’ve heard that, but not from Robert. He’s called out Weblogs, but I havne’t heard a peep about Typebad.

  19. Rich Miller Says:

    Jeremy’s right about the costs associated with five 9s and six 9s of uptime. But I’ve tracked the web hosting industry for a number of years, and there are any number of providers who will do better than a half-hour a month. Some of them are affordable. We track hosting company uptime at Netcraft, and here’s an example of performance data for top hosts for September (sorry, long url):

    http://tinyurl.com/e3pnt

    The October data will be out in the next couple days. Some of these providers offer reasonably priced accounts. But they’re not hooked into the world of blogging, because bloggers are a price-sensitive bunch.

  20. Jeremy Wright Says:

    Yeah, most of the above are dedicated, managed or managed colo facilities. I’m a huge fan of Rackspace :)

  21. Joe Pruitt Says:

    Jaseone,

    You asked what happens when one of our devices goes down. We’ll that depends on how fault tolerant you have made the system. Of course, if you have a single device fronting a single datacenter, then that’s not the most optimal solution. Within a data center we recommend you deploy devices in redundant pairs so that if for some reason one device fails, the other will take over the workload of the first device seemlessly.

    That gets you to single data center redundancy, but what if the whole data center goes down (wide area power outages) or your data center experiences huge unanticipated spikes in traffic that fill it’s available bandwidth? In this case you really need a intelligent multi-datacenter solution based on DNS (which we offer by the way). DNS is distributed in nature so it is very simple to setup a grid of these devices across different data centers to pick up the load and failure on a single one of these devices will not impact routing as the others will automatically pick up the work of the failed unit.

    Let’s say you are hosting a site http://www.foo.com. A client does a DNS lookup on that domain to find the correct routable ip address to connect to. If you have an intelligent DNS system it will be able to return an address for the most acceptable datacenter.

    -Joe

  22. Matt Says:

    Sorry for the downtime, we were adding additional DB servers and syncing them took a few minutes longer than expected. However the upside is that now we have fully redundant copies of all your data spread across several servers, so upgrades like this are going to be fewer and fewer over the coming weeks.

  23. Matt Says:

    Joe, we should get together sometime. :)

  24. Jeremy Wright Says:

    Joe, but I’m sure you know that true high availability is really more than DNS + redundant servers. Clustering, sequential and differential backup systems, mirroring…

    High availability starts with the hardware (RAID, mutliple power supplies, multiple power systems, multiple cooling systems, independent NIC’s) on the server and goes up to per-configuration availability (fault tolerance and failover), through load balancing and DNS to multi-DC setups, backup DNS setups, off-site, non-live setups and a whole host of other things.

    I’ve designed systems that have (knock on wood) never gone down (primary patient care systems). But the reality was that to get a 1-server application to 7 9s of availability (with a full failover so that total downtime would be nanoseconds, which was still a lot) cost upwards of 150K$.

    This stuff isn’t cheap. It’s fun (:D), but it isn’t cheap.

  25. Jeremy Wright Says:

    Also, a typical maintenance window for what Matt’s describing in just about any company would be about 2 hours. 30 minutes ain’t bad :)

  26. Jeff Browning Says:

    Robert, thanks again for stopping by yesterday. We really enjoyed your visit and great discussion.

    This is a fascinating dialogue. It also highlights how the role of a network is changing quickly. The notion of how many “9’s” a company needs is an interesting - and critical - discussion. Need varies based upon the business requirements and budget tolerance.

    But, here’s a different angle on the cost factor. What if you could build an application that ensures total uptime during datacenter/app updates while automating the process through app and network integration? The cost savings in CLI/management effort is significant with total error reduction (i.e. downtime).

    Or, if the device is smart enough to read and understand the datastream and sanitize it to ensure that sensitive information never leaves the datacenter, what’s that worth? (think credit card numbers? SS#s?) It’s kind of like those Mastercard ads… Cost of servers? $$$… Cost of network hardware? $$$… Avoiding the costs of telling your 30,000 customers that you *may* have leaked their credit card numbers? Priceless. ;-)

    We’ve got an iRule on DevCentral that does this. (http://devcentral.f5.com).

    We’re getting to the point where the value of applications running on smart network devices can more than cover the cost of the network gear (and servers, for that matter).

    Uptime and fault tolerance are the foundation to deploying any web app or service. Using more advanced features (APIs, rules, etc.) offer a completely different way of looking at cost/value/business criticality.

    - Jeff

  27. Jeremy Wright Says:

    Jeff, agreed, once you have the infrastructure to actually support “more than enough 9s” of uptime, building an app framework that makes maintenance upgrades and the like seamless is a fantastic idea :)

  28. wasker Says:

    Robert, for God’s sake, go and buy yourself a hosting and forget about hosted blogging services. Take a look at GoDaddy: helluva traffic for only $3.95/mo. Mine blog is hosted there. WordPress setup takes only couple minutes. Why bother with something else?

  29. Justin Baeder Says:

    My understanding is that WP.com is not intended to be a business solution, so it doesn’t really bother me that it should be down for half an hour when people want to use it.

    Annoying? Yes. Terrible? No. No one’s business depends on WP.com being up 99.9% of the time. Nor does a business depend on having their blog up all of the time.

    Also, I assume/hope this maintenance was scheduled, so users knew to expect a 30-min outage. When I used other hosting services, I appreciated their notifications scheduled downtime. If your service isn’t free, I’d expect scheduled maintenance to be done late Saturday night or early Sunday morning, when it would have the least impact on traffic.

    On another note, I think demanding redundant, fault-tolerant, 100%-uptime, RAIDed, load-balanced servers for one’s weblog is akin to demanding that your coffee be served in a double-walled platinum carafe: It might be nice, but it’s really not worth the price. We’re talking blogs here, not e-commerce.

    Each user needs to determine how much uptime is really worth for them. With the vast majority of hosts, the vast majority of bloggers will never experience unscheduled downtime. The difference in downtime between a $5 a month normal host and a $99 a month uber-redundant host will be imperceptible.

  30. Guzzard Says:

    One word “APACHE”.

  31. Jeremy Wright Says:

    Guzzard: WP runs on Apache.

  32. Matt Gerlach Says:

    The blogs hosted at http://www.msmvps.com are down atleast once a day, for atleast 15-20 minutes. It all started around the day at PDC when the electricity went out (I believe the servers that host us are located in LA).

    I feel bad for Susan Bradley (http://msmvps.com/bradley/) she is really trying to get us to a better server, and this keeps happening. She is doing this, from what I believe, all on her own dime. I thank her for everything, and feel bad when people complain about the server down time.

  33. /pd Says:

    Is blog software and serives reached the “industrial strenght” threshold ? Where DR and reduncy is coupled with HAS ??

    I don’t think so. Most companies don’t even have a blog let alone servers and hosting serives for them.

    We the comunity is crying “wolf” just because it is something (blog hosting) we are paying for and want service and uptime. When it blogs become mission critcal then one will find apps and servers being jelled together to obtain “best of class” attriubtion. Till then sitback and expect that we will have “noise” failure for interim moments of time!!

Leave a Reply


Powered By WordPress