WordPress.com was down for about half an hour

Whew, my blog was down for about half an hour. Very frustrating. Demonstrates one of the weaknesses on being on a beta service (WordPress.com is still in development). Yesterday we were over at F5 meeting with the guys who make their hardware and services. They demonstrated to us how to build a service that’ll stay up no matter what happens to your servers.

Service uptimes are very important, and even more important when you are betting your business on them.

When you are looking at a blog tool for your business, you better make sure it has failover to another data center in a different city. Does anyone offer that kind of blog service for a reasonable price?

Comments

  1. Not exactly a blog services, but check out TextDrive.com

    Started by Dean Allen (the developer of TextPattern). Their redundancy is crazy, and the prices are really pretty decent.

    Matt (from WP) is on the team at TextDrive…

  2. Not exactly a blog services, but check out TextDrive.com

    Started by Dean Allen (the developer of TextPattern). Their redundancy is crazy, and the prices are really pretty decent.

    Matt (from WP) is on the team at TextDrive…

  3. Since that time the entire East coast of North America lost its electricity, maybe another continent would be a better bet. A good opportunity here for Anglo-American cooperation :-)

    John Evans

  4. Since that time the entire East coast of North America lost its electricity, maybe another continent would be a better bet. A good opportunity here for Anglo-American cooperation :-)

    John Evans

  5. It’s darn frustrating that WordPress is doing this down thingy so often. It’s like the 2nd time this week now… Can I complain on a free service… LoL

  6. It’s darn frustrating that WordPress is doing this down thingy so often. It’s like the 2nd time this week now… Can I complain on a free service… LoL

  7. Waffle: well, if you can’t complain, then they shouldn’t expect businesses to build on top of their service. And, if businesses can’t build on top of their service, what’s the reason that Matt Mullenweg just quit his job to do this full time?

  8. Waffle: well, if you can’t complain, then they shouldn’t expect businesses to build on top of their service. And, if businesses can’t build on top of their service, what’s the reason that Matt Mullenweg just quit his job to do this full time?

  9. “When you are looking at a blog tool for your business, you better make sure it has failover to another data center in a different city.”

    Robert: Normal businesses have their primary web presences sitting on overloaded, shared servers in the corner of a datacenter somewhere. To them, thirty minutes of downtime is a really good week.

  10. “When you are looking at a blog tool for your business, you better make sure it has failover to another data center in a different city.”

    Robert: Normal businesses have their primary web presences sitting on overloaded, shared servers in the corner of a datacenter somewhere. To them, thirty minutes of downtime is a really good week.

  11. Why would anyone want results from any other search engine? Who cares about openness and “choice” when the best search engine blows them all away.

    Yahoo and Microsoft maps just don’t match the simplicity and quality of Google’s maps. There’s a reason they are #1.

  12. Why would anyone want results from any other search engine? Who cares about openness and “choice” when the best search engine blows them all away.

    Yahoo and Microsoft maps just don’t match the simplicity and quality of Google’s maps. There’s a reason they are #1.

  13. Ron: that’s a nice story, but I don’t agree with you. Zvents won’t take Google maps off of their site even if Virtual Earth is way better. Why? Cause of the advertising hook.

  14. Ron: that’s a nice story, but I don’t agree with you. Zvents won’t take Google maps off of their site even if Virtual Earth is way better. Why? Cause of the advertising hook.

  15. Robert,

    Thanks for stopping by yesterday, we had a great time! We really believe that a new kind of network is coming where the applications being developed (whether web, xml, web services, etc) are designed to communicate with the network to facilitate their health, security, and responsiveness.

    You didn’t mention a link to our site so I’ll post it here for any readers interested:

    http://www.f5.com
    http://devcentral.f5.com

    Roger, As for 30 minutes of downtime being a really good week, I’m not sure what “normal business” you are referring to, but that could equate from thousands to millions of dollars of lost revenue. Not so much for free services, but think about a financial service or any form of eCommerce site. If their site is not running at optimal performance (sub second response) or simple down, a customer will fine somewhere else to do their business and that will not only lose them revenue in the immediate term, but also all future income from that user.

    And this doesn’t boil down to just websites, services are very important as well. As we move into SOA infrastructures and WebServices start to gain more of a presence then their availability, responsiveness, and security are critical to their success.

    Bottom line: If you are using a service, demand multi-site fault-tolerance. If you are running a service, make sure you are prepared for spikes traffic or disaster recovery plans.

    -Joe

  16. Robert,

    Thanks for stopping by yesterday, we had a great time! We really believe that a new kind of network is coming where the applications being developed (whether web, xml, web services, etc) are designed to communicate with the network to facilitate their health, security, and responsiveness.

    You didn’t mention a link to our site so I’ll post it here for any readers interested:

    http://www.f5.com
    http://devcentral.f5.com

    Roger, As for 30 minutes of downtime being a really good week, I’m not sure what “normal business” you are referring to, but that could equate from thousands to millions of dollars of lost revenue. Not so much for free services, but think about a financial service or any form of eCommerce site. If their site is not running at optimal performance (sub second response) or simple down, a customer will fine somewhere else to do their business and that will not only lose them revenue in the immediate term, but also all future income from that user.

    And this doesn’t boil down to just websites, services are very important as well. As we move into SOA infrastructures and WebServices start to gain more of a presence then their availability, responsiveness, and security are critical to their success.

    Bottom line: If you are using a service, demand multi-site fault-tolerance. If you are running a service, make sure you are prepared for spikes traffic or disaster recovery plans.

    -Joe

  17. Joe, 30 minutes of downtime in a month is well within acceptance levels for most companies. Also, fault-tolerance and failover isn’t enough to guarantee even 3 9s of uptime.

    I’m not saying you don’t know your stuff, but if Robert thinks that failover is enough to keep things up to 5 or 6 9s (there’s no such thing as a 100% system), he needs to step into the real world.

    Also, someone should explain to him how costs grow exponentially with each 9 added. If failover or fault tolerance were enough, most companies would just need 3 servers (2 paired in one DC, the backup in a second DC) and they’d “automatically” have 4-5 9s of uptime.

    Again, Joe, not that you don’t know your stuff. Looking at F5, it looks like you do. And it’s likely not your fault Robert walked away with a simplified view of uptime management (it was likely only a few hours you shared, after all).

    I just don’t like to see things so over-simplified, or to see Robert griping about a free, beta service, being down for half an hour … Especially after the outages he’s coming from, or the outages his friends are experiencing on services they PAY for, and which have been public for a year or two.

    Best of luck with F5 Joe :)

  18. Joe, 30 minutes of downtime in a month is well within acceptance levels for most companies. Also, fault-tolerance and failover isn’t enough to guarantee even 3 9s of uptime.

    I’m not saying you don’t know your stuff, but if Robert thinks that failover is enough to keep things up to 5 or 6 9s (there’s no such thing as a 100% system), he needs to step into the real world.

    Also, someone should explain to him how costs grow exponentially with each 9 added. If failover or fault tolerance were enough, most companies would just need 3 servers (2 paired in one DC, the backup in a second DC) and they’d “automatically” have 4-5 9s of uptime.

    Again, Joe, not that you don’t know your stuff. Looking at F5, it looks like you do. And it’s likely not your fault Robert walked away with a simplified view of uptime management (it was likely only a few hours you shared, after all).

    I just don’t like to see things so over-simplified, or to see Robert griping about a free, beta service, being down for half an hour … Especially after the outages he’s coming from, or the outages his friends are experiencing on services they PAY for, and which have been public for a year or two.

    Best of luck with F5 Joe :)

  19. Jeremy: I know my sharing here was simplistic. THere’s only so much you can say in a couple of hundred words. And I know it’s expensive. I paid UserLand’s bills for a while, remember (and we didn’t have failover or any of the fancy dancy things that F5 lets you do).

    But, when you build a business, you better think about these things. You might make the decision to go with a low-cost provider (like I have) but that decision may have consequences for you down the line.

  20. Jeremy: I know my sharing here was simplistic. THere’s only so much you can say in a couple of hundred words. And I know it’s expensive. I paid UserLand’s bills for a while, remember (and we didn’t have failover or any of the fancy dancy things that F5 lets you do).

    But, when you build a business, you better think about these things. You might make the decision to go with a low-cost provider (like I have) but that decision may have consequences for you down the line.

  21. So just out of curiosity what happens when the F5 itself goes down?

    You stated yourself that WordPress.com is in beta and the software it runs on is only in alpha (if that) so why exactly are you running a high profile blog on it that you expect 100% uptime for?

  22. So just out of curiosity what happens when the F5 itself goes down?

    You stated yourself that WordPress.com is in beta and the software it runs on is only in alpha (if that) so why exactly are you running a high profile blog on it that you expect 100% uptime for?

  23. Robert, glad you appreciate the complexities of managing enterprise-level solutions :)

    It’s not a simple world and the level of availability any company provides should be a strategic one (just like the level of company any customer goes with). Right now, availability likely isn’t all that important to the WordPress.com folk, since it’s in beta and all.

    It’s expensive as well. At b5media, we’re just about to go to a load-balanced solution (40% increase in costs). Next we’ll go to a failover-based one (another 60% increase in costs). Finally we’ll go to a closer-to-enterprise one (another 50% increase).

    This stuff is expensive. Sometimes you need to make a choice between your business existing and your business being alive. It’s what happens when you boot-strap.

  24. Robert, glad you appreciate the complexities of managing enterprise-level solutions :)

    It’s not a simple world and the level of availability any company provides should be a strategic one (just like the level of company any customer goes with). Right now, availability likely isn’t all that important to the WordPress.com folk, since it’s in beta and all.

    It’s expensive as well. At b5media, we’re just about to go to a load-balanced solution (40% increase in costs). Next we’ll go to a failover-based one (another 60% increase in costs). Finally we’ll go to a closer-to-enterprise one (another 50% increase).

    This stuff is expensive. Sometimes you need to make a choice between your business existing and your business being alive. It’s what happens when you boot-strap.

  25. Jaseone: why am I using WordPress? Because they won a contest for OPML generation that I ran a few weeks ago. Also, because I want to learn about the bleeding edge of blog services so that I can talk legitimately about the usefulness of those. I also have a Typepad blog (my book blog). My son has a Google Blogger blog. My wife has an MSN Spaces blog. So, we’re covering much of the waterfront.

  26. Jaseone: why am I using WordPress? Because they won a contest for OPML generation that I ran a few weeks ago. Also, because I want to learn about the bleeding edge of blog services so that I can talk legitimately about the usefulness of those. I also have a Typepad blog (my book blog). My son has a Google Blogger blog. My wife has an MSN Spaces blog. So, we’re covering much of the waterfront.

  27. What happened to your DABU blog? – I was hopping that someone like you would put it to the test! I know the domain name is still there, but you have not posted any content!

  28. What happened to your DABU blog? – I was hopping that someone like you would put it to the test! I know the domain name is still there, but you have not posted any content!

  29. Robert, how has Typepad been working? I know they have some problems with growth over there, but you haven’t mentioned how that service is doing.

  30. Robert, how has Typepad been working? I know they have some problems with growth over there, but you haven’t mentioned how that service is doing.

  31. Hi Robert, I’m going OT … but have you a standard wordpress account?
    How can you customize the template of your wordpress blog?

  32. Hi Robert, I’m going OT … but have you a standard wordpress account?
    How can you customize the template of your wordpress blog?

  33. Jeremy’s right about the costs associated with five 9s and six 9s of uptime. But I’ve tracked the web hosting industry for a number of years, and there are any number of providers who will do better than a half-hour a month. Some of them are affordable. We track hosting company uptime at Netcraft, and here’s an example of performance data for top hosts for September (sorry, long url):

    http://tinyurl.com/e3pnt

    The October data will be out in the next couple days. Some of these providers offer reasonably priced accounts. But they’re not hooked into the world of blogging, because bloggers are a price-sensitive bunch.

  34. Jeremy’s right about the costs associated with five 9s and six 9s of uptime. But I’ve tracked the web hosting industry for a number of years, and there are any number of providers who will do better than a half-hour a month. Some of them are affordable. We track hosting company uptime at Netcraft, and here’s an example of performance data for top hosts for September (sorry, long url):

    http://tinyurl.com/e3pnt

    The October data will be out in the next couple days. Some of these providers offer reasonably priced accounts. But they’re not hooked into the world of blogging, because bloggers are a price-sensitive bunch.

  35. Jaseone,

    You asked what happens when one of our devices goes down. We’ll that depends on how fault tolerant you have made the system. Of course, if you have a single device fronting a single datacenter, then that’s not the most optimal solution. Within a data center we recommend you deploy devices in redundant pairs so that if for some reason one device fails, the other will take over the workload of the first device seemlessly.

    That gets you to single data center redundancy, but what if the whole data center goes down (wide area power outages) or your data center experiences huge unanticipated spikes in traffic that fill it’s available bandwidth? In this case you really need a intelligent multi-datacenter solution based on DNS (which we offer by the way). DNS is distributed in nature so it is very simple to setup a grid of these devices across different data centers to pick up the load and failure on a single one of these devices will not impact routing as the others will automatically pick up the work of the failed unit.

    Let’s say you are hosting a site http://www.foo.com. A client does a DNS lookup on that domain to find the correct routable ip address to connect to. If you have an intelligent DNS system it will be able to return an address for the most acceptable datacenter.

    -Joe

  36. Sorry for the downtime, we were adding additional DB servers and syncing them took a few minutes longer than expected. However the upside is that now we have fully redundant copies of all your data spread across several servers, so upgrades like this are going to be fewer and fewer over the coming weeks.

  37. Jaseone,

    You asked what happens when one of our devices goes down. We’ll that depends on how fault tolerant you have made the system. Of course, if you have a single device fronting a single datacenter, then that’s not the most optimal solution. Within a data center we recommend you deploy devices in redundant pairs so that if for some reason one device fails, the other will take over the workload of the first device seemlessly.

    That gets you to single data center redundancy, but what if the whole data center goes down (wide area power outages) or your data center experiences huge unanticipated spikes in traffic that fill it’s available bandwidth? In this case you really need a intelligent multi-datacenter solution based on DNS (which we offer by the way). DNS is distributed in nature so it is very simple to setup a grid of these devices across different data centers to pick up the load and failure on a single one of these devices will not impact routing as the others will automatically pick up the work of the failed unit.

    Let’s say you are hosting a site http://www.foo.com. A client does a DNS lookup on that domain to find the correct routable ip address to connect to. If you have an intelligent DNS system it will be able to return an address for the most acceptable datacenter.

    -Joe

  38. Sorry for the downtime, we were adding additional DB servers and syncing them took a few minutes longer than expected. However the upside is that now we have fully redundant copies of all your data spread across several servers, so upgrades like this are going to be fewer and fewer over the coming weeks.

  39. Joe, but I’m sure you know that true high availability is really more than DNS + redundant servers. Clustering, sequential and differential backup systems, mirroring…

    High availability starts with the hardware (RAID, mutliple power supplies, multiple power systems, multiple cooling systems, independent NIC’s) on the server and goes up to per-configuration availability (fault tolerance and failover), through load balancing and DNS to multi-DC setups, backup DNS setups, off-site, non-live setups and a whole host of other things.

    I’ve designed systems that have (knock on wood) never gone down (primary patient care systems). But the reality was that to get a 1-server application to 7 9s of availability (with a full failover so that total downtime would be nanoseconds, which was still a lot) cost upwards of 150K$.

    This stuff isn’t cheap. It’s fun (:D), but it isn’t cheap.

  40. Joe, but I’m sure you know that true high availability is really more than DNS + redundant servers. Clustering, sequential and differential backup systems, mirroring…

    High availability starts with the hardware (RAID, mutliple power supplies, multiple power systems, multiple cooling systems, independent NIC’s) on the server and goes up to per-configuration availability (fault tolerance and failover), through load balancing and DNS to multi-DC setups, backup DNS setups, off-site, non-live setups and a whole host of other things.

    I’ve designed systems that have (knock on wood) never gone down (primary patient care systems). But the reality was that to get a 1-server application to 7 9s of availability (with a full failover so that total downtime would be nanoseconds, which was still a lot) cost upwards of 150K$.

    This stuff isn’t cheap. It’s fun (:D), but it isn’t cheap.

  41. Robert, thanks again for stopping by yesterday. We really enjoyed your visit and great discussion.

    This is a fascinating dialogue. It also highlights how the role of a network is changing quickly. The notion of how many “9′s” a company needs is an interesting – and critical – discussion. Need varies based upon the business requirements and budget tolerance.

    But, here’s a different angle on the cost factor. What if you could build an application that ensures total uptime during datacenter/app updates while automating the process through app and network integration? The cost savings in CLI/management effort is significant with total error reduction (i.e. downtime).

    Or, if the device is smart enough to read and understand the datastream and sanitize it to ensure that sensitive information never leaves the datacenter, what’s that worth? (think credit card numbers? SS#s?) It’s kind of like those Mastercard ads… Cost of servers? $$$… Cost of network hardware? $$$… Avoiding the costs of telling your 30,000 customers that you *may* have leaked their credit card numbers? Priceless. ;-)

    We’ve got an iRule on DevCentral that does this. (http://devcentral.f5.com).

    We’re getting to the point where the value of applications running on smart network devices can more than cover the cost of the network gear (and servers, for that matter).

    Uptime and fault tolerance are the foundation to deploying any web app or service. Using more advanced features (APIs, rules, etc.) offer a completely different way of looking at cost/value/business criticality.

    - Jeff

  42. Robert, thanks again for stopping by yesterday. We really enjoyed your visit and great discussion.

    This is a fascinating dialogue. It also highlights how the role of a network is changing quickly. The notion of how many “9′s” a company needs is an interesting – and critical – discussion. Need varies based upon the business requirements and budget tolerance.

    But, here’s a different angle on the cost factor. What if you could build an application that ensures total uptime during datacenter/app updates while automating the process through app and network integration? The cost savings in CLI/management effort is significant with total error reduction (i.e. downtime).

    Or, if the device is smart enough to read and understand the datastream and sanitize it to ensure that sensitive information never leaves the datacenter, what’s that worth? (think credit card numbers? SS#s?) It’s kind of like those Mastercard ads… Cost of servers? $$$… Cost of network hardware? $$$… Avoiding the costs of telling your 30,000 customers that you *may* have leaked their credit card numbers? Priceless. ;-)

    We’ve got an iRule on DevCentral that does this. (http://devcentral.f5.com).

    We’re getting to the point where the value of applications running on smart network devices can more than cover the cost of the network gear (and servers, for that matter).

    Uptime and fault tolerance are the foundation to deploying any web app or service. Using more advanced features (APIs, rules, etc.) offer a completely different way of looking at cost/value/business criticality.

    - Jeff

  43. Jeff, agreed, once you have the infrastructure to actually support “more than enough 9s” of uptime, building an app framework that makes maintenance upgrades and the like seamless is a fantastic idea :)

  44. Jeff, agreed, once you have the infrastructure to actually support “more than enough 9s” of uptime, building an app framework that makes maintenance upgrades and the like seamless is a fantastic idea :)

  45. Robert, for God’s sake, go and buy yourself a hosting and forget about hosted blogging services. Take a look at GoDaddy: helluva traffic for only $3.95/mo. Mine blog is hosted there. WordPress setup takes only couple minutes. Why bother with something else?

  46. Robert, for God’s sake, go and buy yourself a hosting and forget about hosted blogging services. Take a look at GoDaddy: helluva traffic for only $3.95/mo. Mine blog is hosted there. WordPress setup takes only couple minutes. Why bother with something else?

  47. My understanding is that WP.com is not intended to be a business solution, so it doesn’t really bother me that it should be down for half an hour when people want to use it.

    Annoying? Yes. Terrible? No. No one’s business depends on WP.com being up 99.9% of the time. Nor does a business depend on having their blog up all of the time.

    Also, I assume/hope this maintenance was scheduled, so users knew to expect a 30-min outage. When I used other hosting services, I appreciated their notifications scheduled downtime. If your service isn’t free, I’d expect scheduled maintenance to be done late Saturday night or early Sunday morning, when it would have the least impact on traffic.

    On another note, I think demanding redundant, fault-tolerant, 100%-uptime, RAIDed, load-balanced servers for one’s weblog is akin to demanding that your coffee be served in a double-walled platinum carafe: It might be nice, but it’s really not worth the price. We’re talking blogs here, not e-commerce.

    Each user needs to determine how much uptime is really worth for them. With the vast majority of hosts, the vast majority of bloggers will never experience unscheduled downtime. The difference in downtime between a $5 a month normal host and a $99 a month uber-redundant host will be imperceptible.

  48. My understanding is that WP.com is not intended to be a business solution, so it doesn’t really bother me that it should be down for half an hour when people want to use it.

    Annoying? Yes. Terrible? No. No one’s business depends on WP.com being up 99.9% of the time. Nor does a business depend on having their blog up all of the time.

    Also, I assume/hope this maintenance was scheduled, so users knew to expect a 30-min outage. When I used other hosting services, I appreciated their notifications scheduled downtime. If your service isn’t free, I’d expect scheduled maintenance to be done late Saturday night or early Sunday morning, when it would have the least impact on traffic.

    On another note, I think demanding redundant, fault-tolerant, 100%-uptime, RAIDed, load-balanced servers for one’s weblog is akin to demanding that your coffee be served in a double-walled platinum carafe: It might be nice, but it’s really not worth the price. We’re talking blogs here, not e-commerce.

    Each user needs to determine how much uptime is really worth for them. With the vast majority of hosts, the vast majority of bloggers will never experience unscheduled downtime. The difference in downtime between a $5 a month normal host and a $99 a month uber-redundant host will be imperceptible.

  49. The blogs hosted at http://www.msmvps.com are down atleast once a day, for atleast 15-20 minutes. It all started around the day at PDC when the electricity went out (I believe the servers that host us are located in LA).

    I feel bad for Susan Bradley (http://msmvps.com/bradley/) she is really trying to get us to a better server, and this keeps happening. She is doing this, from what I believe, all on her own dime. I thank her for everything, and feel bad when people complain about the server down time.

  50. The blogs hosted at http://www.msmvps.com are down atleast once a day, for atleast 15-20 minutes. It all started around the day at PDC when the electricity went out (I believe the servers that host us are located in LA).

    I feel bad for Susan Bradley (http://msmvps.com/bradley/) she is really trying to get us to a better server, and this keeps happening. She is doing this, from what I believe, all on her own dime. I thank her for everything, and feel bad when people complain about the server down time.

  51. Is blog software and serives reached the “industrial strenght” threshold ? Where DR and reduncy is coupled with HAS ??

    I don’t think so. Most companies don’t even have a blog let alone servers and hosting serives for them.

    We the comunity is crying “wolf” just because it is something (blog hosting) we are paying for and want service and uptime. When it blogs become mission critcal then one will find apps and servers being jelled together to obtain “best of class” attriubtion. Till then sitback and expect that we will have “noise” failure for interim moments of time!!

  52. Is blog software and serives reached the “industrial strenght” threshold ? Where DR and reduncy is coupled with HAS ??

    I don’t think so. Most companies don’t even have a blog let alone servers and hosting serives for them.

    We the comunity is crying “wolf” just because it is something (blog hosting) we are paying for and want service and uptime. When it blogs become mission critcal then one will find apps and servers being jelled together to obtain “best of class” attriubtion. Till then sitback and expect that we will have “noise” failure for interim moments of time!!