Why Google is walling off its news garden

Mike Arrington notes that Google is walling off its news garden and keeping other services from spidering it.

Mike is right to point this out as hypocrisy. Google is making money off of other people’s work and wants to have some exclusivity.

Imagine if they did this with Blogger. To tell you the truth I’m very shocked Google hasn’t behaved like this earlier.

Here’s why walled gardens are important to companies (and why we hate them).

Let’s say we were going to develop a competitor to Google News or TechMeme.

Now, I’ve been reading quite a few feeds. Probably about 1/4 as many as the Google News algorithm, but enough to understand how to build a competitor.

There are a few stages to figuring out what is important news.

1) Parse the various parts of a post and put it into buckets. For instance, look at this item about GigaOm getting a new CEO. There’s the headline and the text. Then let’s look at the component parts. There’s who wrote it “CEO Smack”. There’s the subject matter “GigaOm” and “new COO.” Then there’s the body where you probably can find a variety of other important terms. “San Francisco” “Blog” “Om Malik” “Growth” “Paul Walborsky” “sales, operations, conferences” “Hercules Technology Growth Capital” and a link to TechCrunch.
2) Look at the inbound links. In this case there are none, but this is a ranking mechanism. More links means more important news.
3) Look at the comments. Are there any? How fast were they received? How many?
4) Use a human news judgment on the source of the news (is CEO Smack more or less important than, say, News.com, or TechCrunch?)
5) Look at how many times this story has been written about on blogs elsewhere in the past few hours.
6) Look for “news” verbs like “just released” or “new” or “beta” or “exclusive.”
7) Look for “news nouns” like “Microsoft” “Google” “Apple” “Facebook.” (Or whatever company you wanted to track).

Are there other things to study? Not many.

So, how do you get a better display of news than anyone else?

1) Get readers to vote. AKA Digg.
2) Get readers to add comments. AKA new Google stuff.
3) Track readers’ clicking behavior. If you know everyone is clicking on Paris Hilton stories instead of some new software for Facebook, wouldn’t that be valuable to you?
4) Get readers to send in their own news.

Are there many other ways to get a better display?

Now, how do you keep a better display? Easy. Wall it off. Keep your competitors from using the stuff that makes your display special. That way they’ll have to figure out a way to get it on their own rather than just spidering your display results and using that as a bootstrap to build upon.

Personally, the more I look at it, the more I understand what Google’s doing.

How about you?

The reason we hate them? Exactly because of that reason. We can’t bootstrap off of them and build something better.

Oh, and we don’t like it that companies are making profits off of our work. It +is+ our work that is building TechMeme and Google News, isn’t it? Yes. So why not share the profits back with the people who are helping make the system special?

Evil or not? That is the question.

Comments

  1. For the sole reason that most of the manual work there was done by others who don’t share this cake, it is evil.

  2. For the sole reason that most of the manual work there was done by others who don’t share this cake, it is evil.

  3. There is no doubt that the core value of a service like Google News is the quality of the content and therefore the profits should be shared.

    There are 2 issues however. Issues #1: Do you define profits as the money Google is making or the money Google is making plus the money you are making through the traffic they send you. Issue #2: there is no simple and mainstream way to syndicate and track advertisement.

    The problem you are referring to is identical to the problem of “can feed readers make money?” and one of the reason why people still publish (stupid) partial feeds.

    Jason C. had some ideas on how to build a system to syndicate adds in feed readers and share profits. Something like this needs to emerge and be as simple as adwords for this problem to go away.

    (In this specific case, I thought Google was giving some money to they source for using information and images and that some of the constraint they are putting on or just due to those legal contracts…but I might be wrong)

    -Edwin

  4. There is no doubt that the core value of a service like Google News is the quality of the content and therefore the profits should be shared.

    There are 2 issues however. Issues #1: Do you define profits as the money Google is making or the money Google is making plus the money you are making through the traffic they send you. Issue #2: there is no simple and mainstream way to syndicate and track advertisement.

    The problem you are referring to is identical to the problem of “can feed readers make money?” and one of the reason why people still publish (stupid) partial feeds.

    Jason C. had some ideas on how to build a system to syndicate adds in feed readers and share profits. Something like this needs to emerge and be as simple as adwords for this problem to go away.

    (In this specific case, I thought Google was giving some money to they source for using information and images and that some of the constraint they are putting on or just due to those legal contracts…but I might be wrong)

    -Edwin

  5. Edwin: in some cases the site doing the work isn’t getting ANYTHING.

    For instance, I know that TechMeme studies my link blog behavior but it neither gives me credit for that, nor does it send any traffic my way. But it definitely is profiting off of my work.

  6. Edwin: in some cases the site doing the work isn’t getting ANYTHING.

    For instance, I know that TechMeme studies my link blog behavior but it neither gives me credit for that, nor does it send any traffic my way. But it definitely is profiting off of my work.

  7. Scoble,

    TechMeme make money off you, you make money off us, we make money off them.

    Every post is about your videos, then we watch them and that pays your bills.

    You profit off the work of other people, people who build stuff. They need to advertise it and you provide a cheep mechanism.

    I liked your “Parse the various parts of a post and put it into buckets.” that made me LOL. Buckets. That’s the sort of thing I tell my CEO to try and get him to understand. Yes, programming is like LEGO… haha, no programming is like doing 3D sudoku puzzles. Fun.

    monk.e.boy

  8. Scoble,

    TechMeme make money off you, you make money off us, we make money off them.

    Every post is about your videos, then we watch them and that pays your bills.

    You profit off the work of other people, people who build stuff. They need to advertise it and you provide a cheep mechanism.

    I liked your “Parse the various parts of a post and put it into buckets.” that made me LOL. Buckets. That’s the sort of thing I tell my CEO to try and get him to understand. Yes, programming is like LEGO… haha, no programming is like doing 3D sudoku puzzles. Fun.

    monk.e.boy

  9. Robert, I personally study your link blog. It’s in my Google Reader and I learn a lot about tech news trends from it. Almost certainly, I apply that information to what I do.

    The idea of attribution is very tricky. Do you know that the vast majority of the time, when a blogger learns something on Techmeme and blogs about it, Techmeme won’t get credit for the discovery? That’s very similar to the situation you alleged. Is that wrong? I’m not sure. BTW, you’re probably among the best bloggers as far as crediting Techmeme goes, but I bet you still don’t do it half the time.

    Anyhow, for the record, I should state that lately, Techmeme has NOT used your link blog.

  10. Robert, I personally study your link blog. It’s in my Google Reader and I learn a lot about tech news trends from it. Almost certainly, I apply that information to what I do.

    The idea of attribution is very tricky. Do you know that the vast majority of the time, when a blogger learns something on Techmeme and blogs about it, Techmeme won’t get credit for the discovery? That’s very similar to the situation you alleged. Is that wrong? I’m not sure. BTW, you’re probably among the best bloggers as far as crediting Techmeme goes, but I bet you still don’t do it half the time.

    Anyhow, for the record, I should state that lately, Techmeme has NOT used your link blog.

  11. There’s an easy solution.

    Have a few companies put aside a fund to start a propoghanda war against Google Spider, and have the main concept be a Google bot blocker guised as a spam protector on sourceforge, then use the promo cash to popularize the hell out of it.

    Have the user-agent list autoupdate that way if Google tries to hack a new user agent for the google bot to bypass the list, it will be freshly blocked again.

    Google’s results will become irrelevant after about a month and the stock will plummet to zero.

    Microsoft and other competitors in search have more than enough money to do something like this. Just look at what they did with get the facts.

  12. There’s an easy solution.

    Have a few companies put aside a fund to start a propoghanda war against Google Spider, and have the main concept be a Google bot blocker guised as a spam protector on sourceforge, then use the promo cash to popularize the hell out of it.

    Have the user-agent list autoupdate that way if Google tries to hack a new user agent for the google bot to bypass the list, it will be freshly blocked again.

    Google’s results will become irrelevant after about a month and the stock will plummet to zero.

    Microsoft and other competitors in search have more than enough money to do something like this. Just look at what they did with get the facts.

  13. more importantly, when Google’s bot can no longer search half the internet due to most people adopting this bot spam blocker, it will have tought them a valuable lesson in being pilfering jackasses.

  14. more importantly, when Google’s bot can no longer search half the internet due to most people adopting this bot spam blocker, it will have tought them a valuable lesson in being pilfering jackasses.

  15. If some competitor does this, and writes a new “spam filter” FOSS apache module conveniently packaged as RPM and .deb, remember to use APR_HOOK_FIRST so that your module hooks the google bot user-agent before anything else has a chance to handle it. Tell people to load that module first in httpd.conf, otherwise google could make a counter apache module and popularize it, and have that load first.

    Also consider that every shared server you have people load this on will kill 100 or more sites from being indexed in Google, so advertise it on webhosting forums and server admin forums first.
    http://www.webhostingtalk.com/
    There’s a good one. It’s run by the same company as hotscripts.com

  16. If some competitor does this, and writes a new “spam filter” FOSS apache module conveniently packaged as RPM and .deb, remember to use APR_HOOK_FIRST so that your module hooks the google bot user-agent before anything else has a chance to handle it. Tell people to load that module first in httpd.conf, otherwise google could make a counter apache module and popularize it, and have that load first.

    Also consider that every shared server you have people load this on will kill 100 or more sites from being indexed in Google, so advertise it on webhosting forums and server admin forums first.
    http://www.webhostingtalk.com/
    There’s a good one. It’s run by the same company as hotscripts.com

  17. The overall post is very well done, you have summarized everything. Coincidently, at the time of the Google release, I am developing a similar product that is aimed at servicing large corporates, facilitating a similar service.

    This is what I see wrong with what Google is doing, and why it is likely to go no where.

    From everything I’ve seen, there is no customization. They are simply asking people who already commented to comment further, but comment on what? How are those questions being decided? Who is the individual (s) that decide, ‘we need to know this’, we’ll ask it to the ‘experts.’ How are the questions being qualified? Then there’s the whole can of worms of, okay, the ‘expert’ answered this, now I want them to answer this…..and on and on.

    What works with comment section like Digg or the one that I’m commenting on now is that it is interactive with the users. Google is creating 3 areas and putting a wall around one.

    The initial article
    The readers
    The experts

    By adding the experts into the mix, there becomes a whole slew of intangibles that Google, I believe is not thinking deeply enough about.

    All they are seeing is ‘this is different’ ‘this will separate us from the rest.’

  18. The overall post is very well done, you have summarized everything. Coincidently, at the time of the Google release, I am developing a similar product that is aimed at servicing large corporates, facilitating a similar service.

    This is what I see wrong with what Google is doing, and why it is likely to go no where.

    From everything I’ve seen, there is no customization. They are simply asking people who already commented to comment further, but comment on what? How are those questions being decided? Who is the individual (s) that decide, ‘we need to know this’, we’ll ask it to the ‘experts.’ How are the questions being qualified? Then there’s the whole can of worms of, okay, the ‘expert’ answered this, now I want them to answer this…..and on and on.

    What works with comment section like Digg or the one that I’m commenting on now is that it is interactive with the users. Google is creating 3 areas and putting a wall around one.

    The initial article
    The readers
    The experts

    By adding the experts into the mix, there becomes a whole slew of intangibles that Google, I believe is not thinking deeply enough about.

    All they are seeing is ‘this is different’ ‘this will separate us from the rest.’

  19. I thought they didn’t crawl their own news site so that they wouldn’t have links to their news site from their news site.

    That’s like asking why they don’t let you spider search results. Why don’t they just open up my Gmail for spidering too?

    If I searched on Yahoo for some news, and got something on Google that pointed to the original article, I would be frustrated and confused. I think their just trying to not dilute search results with repetitive content.

    It would be kind of like looking at someones Facebook page, and seeing someone else talk about that someone and Facebook.

  20. I thought they didn’t crawl their own news site so that they wouldn’t have links to their news site from their news site.

    That’s like asking why they don’t let you spider search results. Why don’t they just open up my Gmail for spidering too?

    If I searched on Yahoo for some news, and got something on Google that pointed to the original article, I would be frustrated and confused. I think their just trying to not dilute search results with repetitive content.

    It would be kind of like looking at someones Facebook page, and seeing someone else talk about that someone and Facebook.

  21. “Evil” is tossed around a bit lightly. That’s a pretty strong word that has no relevance here. You’re diminishing the truth of what “evil” really is, to be honest.

  22. “Evil” is tossed around a bit lightly. That’s a pretty strong word that has no relevance here. You’re diminishing the truth of what “evil” really is, to be honest.

  23. Umm, how is google making money off the news?

    There aren’t any ads on google news. Sure, they might indirectly make money off somebody visiting news and then searching the web and clicking an ad… but then they’re making money off of search ads.. NOT news.

  24. Umm, how is google making money off the news?

    There aren’t any ads on google news. Sure, they might indirectly make money off somebody visiting news and then searching the web and clicking an ad… but then they’re making money off of search ads.. NOT news.

  25. Ryan- no, they’re not making money off *just* search ads- they’re making a hell of a lot of money off syndicated ads, which have nothing to do with search. To the extent that they can drive traffic to offsite news properties that syndicate their ads (and there are a LOT of them), they do make money off news.

    It’s simply driving traffic- that’s how Google makes money, and the key to that is AdSense, not search ads.

  26. Ryan- no, they’re not making money off *just* search ads- they’re making a hell of a lot of money off syndicated ads, which have nothing to do with search. To the extent that they can drive traffic to offsite news properties that syndicate their ads (and there are a LOT of them), they do make money off news.

    It’s simply driving traffic- that’s how Google makes money, and the key to that is AdSense, not search ads.

  27. @15, that’s pretty harsh. I thought this was a good post.

    “Oh, and we don’t like it that companies are making profits off of our work. It +is+ our work that is building TechMeme and Google News, isn’t it?”

    http://www.internetnews.com/xSP/article.php/3487041

    “All machines run on a stripped-down Linux kernel. The distribution is Red Hat (Quote), but Hoelzle said Google doesn’t use much of the distro. Moreover, Google has created its own patches for things that haven’t been fixed in the original kernel.”

    So where’s Linus’s royalty check?
    He after all coded more of Google than Larry Page and Sergey Brin combined.
    Pay up Google. What if Linus had walled his stuff off from you???

    They’re pissants Scoble. All of them. Very wealthy and fearful pissants. Sorry.

  28. @15, that’s pretty harsh. I thought this was a good post.

    “Oh, and we don’t like it that companies are making profits off of our work. It +is+ our work that is building TechMeme and Google News, isn’t it?”

    http://www.internetnews.com/xSP/article.php/3487041

    “All machines run on a stripped-down Linux kernel. The distribution is Red Hat (Quote), but Hoelzle said Google doesn’t use much of the distro. Moreover, Google has created its own patches for things that haven’t been fixed in the original kernel.”

    So where’s Linus’s royalty check?
    He after all coded more of Google than Larry Page and Sergey Brin combined.
    Pay up Google. What if Linus had walled his stuff off from you???

    They’re pissants Scoble. All of them. Very wealthy and fearful pissants. Sorry.

  29. Google News is aggregation, like Techmeme…Google News Commentary is content, like Digg(their content is votes & comments). Digg has an API that lets you get their content. Google News Commentary is closed off content. Big difference.

  30. Google News is aggregation, like Techmeme…Google News Commentary is content, like Digg(their content is votes & comments). Digg has an API that lets you get their content. Google News Commentary is closed off content. Big difference.

  31. Well, Google News Commentary isn’t actually anything yet…which makes it a bit early to tear it apart.

  32. Well, Google News Commentary isn’t actually anything yet…which makes it a bit early to tear it apart.

  33. Reuters copyright notice, for example, states “Users may download and print extracts of content from this website for their own personal and non-commercial use only.”

    I suspect Google must have agreements with Reuters and other news organizations that allow Google to crawl and present their content on news.google.com. The suspect the walled garden is required by some of these agreements.

  34. Reuters copyright notice, for example, states “Users may download and print extracts of content from this website for their own personal and non-commercial use only.”

    I suspect Google must have agreements with Reuters and other news organizations that allow Google to crawl and present their content on news.google.com. The suspect the walled garden is required by some of these agreements.

  35. Robert,

    There are many metrics for value in data collection: “signal to noise”, ROI, etc.

    I think Google is attempting to create a comment strategy that improves the comment mechanism. Only those with significant knowledge on the story should get a forum. That’s a capability that comes with scale… Is Google evil for attempting to leverage scale? Probably not. If they leverage scale to unfairly compete with a start-up then the consumer looses.

    Your post clearly makes the case that there’s room for improved automated news search that leverages intelligence over market position.

    The letters to the editor at the New York Times are often more influential than the Spudville Times. Scale matters… it funds editorial judgement.
    The Spudville Times will be driven from the market before the NY Times folds…

    What amazes us all is how far Scoble has scaled in collecting and filtering news. You must get very tired at times.

  36. Robert,

    There are many metrics for value in data collection: “signal to noise”, ROI, etc.

    I think Google is attempting to create a comment strategy that improves the comment mechanism. Only those with significant knowledge on the story should get a forum. That’s a capability that comes with scale… Is Google evil for attempting to leverage scale? Probably not. If they leverage scale to unfairly compete with a start-up then the consumer looses.

    Your post clearly makes the case that there’s room for improved automated news search that leverages intelligence over market position.

    The letters to the editor at the New York Times are often more influential than the Spudville Times. Scale matters… it funds editorial judgement.
    The Spudville Times will be driven from the market before the NY Times folds…

    What amazes us all is how far Scoble has scaled in collecting and filtering news. You must get very tired at times.

  37. Monetization aside, don’t you think that this is an interesting take on the news?

    When everyone can comment on a given news item (which they can on their own blog)it buries the opinions of the people within the story who may want to “set the record straight” or have story elements that add an insider opinion. I am not saying a disagree with you, I just think that this is an interesting way to look at news

  38. Monetization aside, don’t you think that this is an interesting take on the news?

    When everyone can comment on a given news item (which they can on their own blog)it buries the opinions of the people within the story who may want to “set the record straight” or have story elements that add an insider opinion. I am not saying a disagree with you, I just think that this is an interesting way to look at news

  39. Danny.. Your point doesn’t make sense in regards to this post.

    IF the whole point of Google News is to drive traffic to the news sites that have adsense… then it wouldn’t make any sense to wall it off. They’d be eliminating potential traffic to those sites.

  40. Danny.. Your point doesn’t make sense in regards to this post.

    IF the whole point of Google News is to drive traffic to the news sites that have adsense… then it wouldn’t make any sense to wall it off. They’d be eliminating potential traffic to those sites.