Daily Archives: August 20, 2006

I need Excel help, off to Amazon’s Mechanical Turk

Update: I already got the cleaned file, thanks to several of my readers! Appreciate the help!

OK, I downloaded the latest change.xml file from weblogs.com. If you don’t know what weblogs.com is, this is a service that most weblog tools will “ping,” or let know that someone has just published.

In the early days of blogging Dave Winer and other bloggers would watch this page like a hawk since it would display when new people had just posted. Remember, when I started blogging there were only a couple of hundred bloggers with only a few dozen posts a day. You could read this page just like many of us read TechMeme or TailRank now.

Anyway, I just downloaded the last hour and there were more than 60,000 entries in that file. Whew! OK, I went through brute force and cleaned up just the “As.” Brute force means I just went through and deleted them by hand, not using any macro or scripts.

It’s taking too long to do it by hand (60,000 URLs is too many) and, anyway, it’d be fun to redo this test over and over to see if the numbers of blogs done from each service change depending on the day of week and time of day.

Anyway, here’s what I need done. This is a perfect job for Amazon’s Mechanical Turk. That service lets you spec out a small job, and get someone who has a little extra time to do it to do it for you for a reasonable fee.

On the other hand, I’ll also ask here. Here’s what I need:

1) Take my Excel .XLS file (I’ll clean it up and put it into a column for you) and delete all the URLs that don’t come from blogspot.com; wordpress.com; livejournal.com; spaces.live.com; typepad.com.

That’s it. Easy, huh? Should take one of the programmer types here a few minutes to write an Excel macro to do that. If you’d rather me just hand you a comma-delimited text file, I can do that too. Or, you can just go get the file yourself from weblogs.com (it’s an XML file) and clean it up yourself. I just need the URLs, I don’t care about anything else.

Is Microsoft really the largest blog vendor?

Microsofties take it on face value that they host the most blogs. They even love shoving it in your face. Yesterday someone who works on the Windows Live team was taunting me with “influentials don’t matter, we got to be #1 and we don’t care that there aren’t any influential bloggers using our stuff.”

I was asking them why so few bloggers at BlogHer or Gnomedex use Windows Live Spaces, which is Microsoft’s blog and photo sharing service.

Today I see that George Moore, General Manager of Windows Live, just told a crowd in New Zealand that Windows Live is “now the largest blogging service on the planet.” At least according to Richard MacManus, who I’ve found to accurately report past events, and who is at TechED in New Zealand.

So, that made me itch and when I have an itch I want to scratch it.

Here’s my what’s itching me:

1) Is Windows Live Spaces really used as a blog service very often?
2) Is Microsoft only counting when it’s used as a blog service, or is it counting all uses of Windows Live Spaces?
3) Do other services actually have more “real” blogs? At least percentagewise?

Now, I know that WordPress.com (currently the service that most of the “in crowd” is recommending) only has about 300,000 blogs. Microsoft is claiming 72 million blogs.

So, over the next few hours I’m gonna do some analysis and see if I can find out how much overcounting there’s going on (there is SOME overcounting, based on my initial looks at http://spaces.live.com and http://www.weblogs.com — I see a whole bunch of things there that don’t look like blogs at all).

First, let’s define what a blog is, at least enough to count for this purpose.

1) Have original content. Spam blogs that are copied off of somewhere else don’t count.
2) Have at least 500 words of new text-based content every month. Things that look like Flickr streams aren’t blogs, sorry.
3) Have at least two posts in at least the past 30 days. If you aren’t posting, you’re not blogging.
4) I don’t care if you have comments, have trackbacks, have blogrolls, or any of that.

Here’s my methodology.
1) I’m going to pull the last hour’s worth of content that was published to each of the services, as reported to weblogs.com as of 3:52 p.m. today (before I post this so no one has time to monkey with the results).
2) I’m going to also visit the home pages of http://spaces.live.com and www.blogger.com and www.wordpress.com and www.typepad.com and report on the percentage of blogs that I find that have been published to their “most recently published” pages are actually blogs.

Add all those percentages together and find an average. Then take that average to the reported number of blogs on each service and see if Microsoft is still #1.

Does that sound like a good methodology? Any changes you’d make?

One thing that’ll be interesting is to compare the percentages today with percentages on, say, Wednesday since I’d expect more “everyday people” to be blogging today, while on Wednesday I’d expect to see more corporate bloggers, which, my thesis is, will skew more away from Windows Live Spaces.

What do you think?

What results do you expect to see from such an exercise?

Disclaimers, Maryam, my wife, uses Windows Live Spaces. I use WordPress.com. Our book blog, Naked Conversations, is on Typepad. My son used to be on Google Blogger, but he is now on WordPress.com too.

Google Writelys home new version of online word processor

Steve Newson talked about Boing Boing’s discussion of a new Google Writely (word processor for the Web) and then said something I found interesting cause I was thinking it too: “For now I’m going to stick with Live Writer…”


Cause I don’t use Wordprocessors much anymore and when I do (to print out a fax cover page, or something like that) then I just fire up my copy of Word 2003. Now, would I write a book in Writely? Maybe, it sure would have made collaborating with Shel Israel easier. But, really, what I wanted to do was just stay in my blog tool anyway.

The only reason we used Word was cause our publisher told us to. Hey, they were paying us money so we weren’t gonna argue with them.

One little aside: did you notice that Newsome didn’t link to Boing Boing? Damn, how did I find it? Easy, I subscribe to Boing Boing, by his talking about Boing Boing I just made sure to check out the Boing Boing feed and away we go. No link necessary.

Oh, there are some of you who don’t subscribe to Boing Boing? What kind of freaks are you? Heheh.

Anyway, back to Writely, why do I like offline editors better? They just feel better for editing blogs. Plus you get a separate icon on the Taskbar, so you can switch between browsing and editing without making a mistake (how many of us have stupidly clicked on something to watch our blog post disappear as the browser refreshes?)

I also like being offline incase Internet connectivity goes away. I’ve had it happen more than once that something goes wrong and my Web browser decides to refresh for some reason, wiping out a post.

It’s why offline is so good.

James says PodTech site sucks: I agree

We’re in the middle of a site redesign. PodTech’s site sucks, James Robertson says. I agree.

You’ll note I have been linking to everything BUT PodTech lately. Why? Cause PodTech will need to earn links just like everyone else.

The home page is a disaster. I can’t figure it out either. Funny enough, even inside a startup there are “portal vs blog” disagreements and just last week I’ve been involved in some meetings that reminded me a lot of executive review meetings at Microsoft — even in a startup you have to convince people that your way is the best. It’s why at Google and at Microsoft they measure measure measure everything. If you wanna go into Marissa Mayer’s office and tell her she’s wrong you BETTER have the proof to back up your theories.

I want a simple aggregator view on the home page. Something like the one on Share Your OPML. It’s ugly, yes (that can be fixed with the help of a good designer like Bryan Bell) but it works and it lets me fish through tons of posts. 

Ugh, I just noticed the aggregator is busted there, gotta give Dave a call. But, you can see the format anyway.

It’s my thesis that people will scroll almost infinitely. Just give them high-quality stuff. At Microsoft they did research and found most people won’t click on the “next” button. But, they will scroll. You’ll notice that the search engine at live.com doesn’t ever end. If I remember the research right they are finding that people look at something like five times more information if it just keeps scrolling than if they have to click next.

But some of the team believe that everything needs to fit into one screenful. They want a portal model.

“But won’t people have trouble finding our other pages if there isn’t a link to everything we do?”

“Um, no, what do you think Google does?”

Google doesn’t use graphics. Doesn’t have long columns of text next to each other (Google goes vertical when it needs to present more information than fits in one screen — at least most of the time, Google News breaks that mold a little bit, but not really — even there it goes on to scroll baby scroll). Doesn’t do much of anything other than six simple text links.

How boring! Especially when compared to Yahoo’s home page, right? That uses graphics, tons of links, more graphics. More links.

Intuition would tell you that Yahoo’s page works better, right? Well, let’s examine some of the facts.

Compare Yahoo to Google’s stock chart. NOT BORING!!!

I’m gonna win this argument cause of that chart. Simple is better. Text links are better than graphic color crap (my eye filters out such as advertising, doesn’t yours?) People link more on simple blue-underlined text than on color crap. That’s why Google accidentally found its AdSense business model: they stopped to WATCH what the users actually DO, not what they WANTED the users to do.

One place a little bit of color does help, though, is the top news flag on MSNBC. I love that site. Why? Cause every few hours some editor in Redmond sits down, picks a story, then designs a photo flag for it with a headline and several links.

So, let’s compromise, I told the team. Do an MSNBC-style flag, with an aggregator underneath it. That way they get their editorial old-school style control and portal instincts fed, and I get my river of text and links, which makes users happy (anyone notice DIGG? Text and links baby! Oh, and with the usual rounded corner graphics. I’ll bet we have to pay some design house tens of thousands of dollars to come up with some rounded corner graphics. Heheh.

DIGG is growing faster percentagewise than any other site I know of. If you argue against Digg, you BETTER have those measurements to back up your claim. And saying “Yahoo has more viewers” is NOT a good argument. That’s like listening to DEC back in 1977 when the CEO there said the world doesn’t need personal computers.

Growth is more important, especially to a startup, because that’s where the opportunity to kick the old school in the groin exists.

Anyway, until we make the home page much more Google-like James will continue to be right: our home page sucks.

I’ll let you know when the redesign comes up. Until then, go watch Ze Frank or Rocketboom.

We still have the scars… (Jeremy wins award for killer business card)

Donna Bogatin, over on ZDNet asks what’s wrong with a little party? Or even a big one?

It’s cause we still have the scars from when we all partied in 1999 and then got laid off in 2000-02. So, we deal with those scars by being snarky about the 2006 series of parties.

It’s a defense mechanism so that if we get laid off again we can say “well, at least we saw it coming this time.” Of course, we’re working our behinds off trying to make sure that it doesn’t happen again.

Greg goes further and asks why are party mentions on top of TechMeme and isn’t there something more important to cover? He, too, makes it sound like TechMeme is done by human beings. It’s not. It’s done by the linking behavior of bloggers. If bloggers link to something it gets on TechMeme. It’s that simple.

And, of course when you have a party with 700 of the world’s best-known geeks it’s going to cause discussion on blogs. Duh.

I did want to point out that Jeremy Wright’s business card was custom done just for the TechCrunch party. That’s killer. He ordered them from Printing for Less (the ePrinting company we visited in Livingston, MT — I predict that PFL will be in BusinessWeek within six months, it’s a remarkable business, but I’m holding out what I learned from its CEO for my first show — Andrew is now my business hero, you’ll find out why on that show). I’ll have to add that tip from Jeremy onto my business card best practices. I’ll never forget where I got this card from Jeremy. He says it only costs $20 to do a set of cards for a big event. I also like that on the back he has some attitude and puts “kickass bloggers” to describe B5′s network. I’m gonna get some of my own cards done (hey, Hugh, wanna do me a PodTech card?)

Thanks to Irina Slutsky for keeping the Flickr stream for me so that I know what she’s doing without having to bug her — hey, I wonder if she will get hazard pay for dealing with Zombies? Can’t wait to see the interview with Linden Labs’ founder!