Why do search engines lie?

Here, do a search for Memetrackers (Google, MSN, Yahoo). Now, why are none of their numbers accurate? Google says there are 713 results, but can only display 62. MSN says there are 101, but only can display 100. Yahoo says there are 368, but only can display 44.

Why aren’t there any truth in advertising laws for search engines?

Update: the numbers are changing. Google now says there are 699 results, but can only display 692 (this is after you tell it to display all duplicates).

Oh, and no engine can display more than about 1,000 results, so if they say there’s 42,000,000 results there’s no way to verify whether those numbers are accurate or not.

Update 2: Yahoo is actually accurate once you tell it to display all duplicates. It says 429 results and displays 429 results. So, Yahoo wins! (although I wish they’d all be a little clearer up front).

Comments

  1. David Utter says:

    @JG: >Could it be that Google has internet filters not only on China, but on the US, too? Why is it that the UK has twice the index size?

    Maybe G UK has all those Scientology pages it made Google remove in the US. :)

  2. David Utter says:

    @JG: >Could it be that Google has internet filters not only on China, but on the US, too? Why is it that the UK has twice the index size?

    Maybe G UK has all those Scientology pages it made Google remove in the US. :)

  3. Joe Anderson says:

    Yet studies show out of Google, MSN and Yahoo!, Google is the best.

  4. Joe Anderson says:

    Yet studies show out of Google, MSN and Yahoo!, Google is the best.

  5. Yakov says:

    The maior problem of searching is that the result that I look for is often not on the first page! From user viewpoint, why would that matter if a search engine returns 1 million or 1 billion results?

  6. Yakov says:

    The maior problem of searching is that the result that I look for is often not on the first page! From user viewpoint, why would that matter if a search engine returns 1 million or 1 billion results?

  7. John Moody says:

    And that’s why AltaVista died – number of results is meaningless. It’s the quality of the first page of results that ultimately matters. That’s why Google is on top of search right now – because a year ago, their first page of results was vastly better than Yahoo, MSN, etc. Even though the other guys have largely caught up, Google got there early enough to establish themselves as “the” search provider.

  8. John Moody says:

    And that’s why AltaVista died – number of results is meaningless. It’s the quality of the first page of results that ultimately matters. That’s why Google is on top of search right now – because a year ago, their first page of results was vastly better than Yahoo, MSN, etc. Even though the other guys have largely caught up, Google got there early enough to establish themselves as “the” search provider.

  9. scobleizer says:

    John: totally agreed.

  10. scobleizer says:

    John: totally agreed.

  11. JG says:

    To all who answered my “* *” wildcard query post: Yes, I buy the BigDaddy explanation. But.. that still begs the question: Why do I only see 8-11 billion results in the U.S.? I’ve done this wildcard query from home, from work, from the east coast, from the west coast. Never in the past 8-9 months since I started doing the query have I even come across even half of the 25 billion web pages suppposedly in the Google index. Why does the UK get to see 18 billion? Why does Brasil get to see all 25 billion?

    Some of you above have said that size does not matter, that nobody cares about the long tail, so it doesn’t matter if Google’s index only shows me 8 billion or 18 billion or whatever. Well, I could debate that, but I don’t want to go too far off topic. Let’s just say that, even if that long-tail argument is true, for any one query (i.e. nobody scrolls past 30, much less 10 documents, anyway), it is probably not true, for the entire set of possible queries.

    What I mean is that for your query, you might never need to look at document #10,575. But for somebody else’s differently-worded query, that same document will be ranked 3rd. And if that document is not in the index, because Google U.S. is only showing us 8 billion of the 25 billion web pages, then for this latter query, someone is not getting the information they need.

    So there are two long tails here: (1) is the long tail for one query. (2) is the long tail of all possible queries. I would argue that (2) is much more important, if not vital. By only showing 8+ billion pages in the U.S., Google is robbing us of the top-10 results to all those queries.

    The final problem with this whole “non-truth in advertising” thing is that, when I do the wildcard search, is there no notice at the bottom of the page saying “Because of the DMCA, pages have been removed from your search”? I mean, because, after all, if my wildcard search really is returning all 8 billion pages from the non-bigdaddy server, then some of those will have already been removed, right? So Google has removed pages, and has not told me that it has removed them.

    That’s a big problem. That’s a big trust issue.

  12. JG says:

    To all who answered my “* *” wildcard query post: Yes, I buy the BigDaddy explanation. But.. that still begs the question: Why do I only see 8-11 billion results in the U.S.? I’ve done this wildcard query from home, from work, from the east coast, from the west coast. Never in the past 8-9 months since I started doing the query have I even come across even half of the 25 billion web pages suppposedly in the Google index. Why does the UK get to see 18 billion? Why does Brasil get to see all 25 billion?

    Some of you above have said that size does not matter, that nobody cares about the long tail, so it doesn’t matter if Google’s index only shows me 8 billion or 18 billion or whatever. Well, I could debate that, but I don’t want to go too far off topic. Let’s just say that, even if that long-tail argument is true, for any one query (i.e. nobody scrolls past 30, much less 10 documents, anyway), it is probably not true, for the entire set of possible queries.

    What I mean is that for your query, you might never need to look at document #10,575. But for somebody else’s differently-worded query, that same document will be ranked 3rd. And if that document is not in the index, because Google U.S. is only showing us 8 billion of the 25 billion web pages, then for this latter query, someone is not getting the information they need.

    So there are two long tails here: (1) is the long tail for one query. (2) is the long tail of all possible queries. I would argue that (2) is much more important, if not vital. By only showing 8+ billion pages in the U.S., Google is robbing us of the top-10 results to all those queries.

    The final problem with this whole “non-truth in advertising” thing is that, when I do the wildcard search, is there no notice at the bottom of the page saying “Because of the DMCA, pages have been removed from your search”? I mean, because, after all, if my wildcard search really is returning all 8 billion pages from the non-bigdaddy server, then some of those will have already been removed, right? So Google has removed pages, and has not told me that it has removed them.

    That’s a big problem. That’s a big trust issue.

  13. meeeemm says:

    google
    Results 1 – 10 of about 830 for memetrackers. (0.19 seconds)

  14. meeeemm says:

    google
    Results 1 – 10 of about 830 for memetrackers. (0.19 seconds)

  15. [...] If you want to gain market share in search, you have to deliver results to the customer that meet their needs in a better way.  Speed alone is no longer an issue – cutting 0.2 seconds in half won’t make a difference to the customer.  Number of results was never an issue, which is why no one’s using AltaVista anymore.  (And as Robert Scoble pointed out yesterday, the number of results usually aren’t quite right anyway!)  No, you’re going to have to figure out what I’m really looking for, then wow me with how you deliver it. [...]

  16. drmike says:

    Google
    Results 1 – 10 of about 192 for memetrackers. (0.06 seconds)

    As stated above its all about which datacenter you’re at. You can compare results between teh different datacenters here

  17. drmike says:

    Google
    Results 1 – 10 of about 192 for memetrackers. (0.06 seconds)

    As stated above its all about which datacenter you’re at. You can compare results between teh different datacenters here

  18. [...] Yesterday Robert Scoble did a post on Why do search engines lie? [...]

  19. Tetra says:

    I think I just got dumber reading this blog entry. Who cares about results past the first couple of pages?

  20. Tetra says:

    I think I just got dumber reading this blog entry. Who cares about results past the first couple of pages?

  21. Innocent Bystander says:

    Kinda depends on the goal of the search.

    It would make no sense for the search on ebay or amazon to show you something they can’t sell you – so they show you something kind of similar that lots of people buy.

    Its not hard to argue that this is better than vanilla dumb search.

  22. Innocent Bystander says:

    Kinda depends on the goal of the search.

    It would make no sense for the search on ebay or amazon to show you something they can’t sell you – so they show you something kind of similar that lots of people buy.

    Its not hard to argue that this is better than vanilla dumb search.

  23. [...] Google najde pouhopouhých 16 odkazů (z což v reálu znamená 11 odkazů, viz Why do search engines lie?). Hodně odkazů je navíc nerelevantních – ten první odkazuje na conBLOG, druhý na Bobův weblog, vždy však na domovskou stránku místo na konkrétní článek. Výsledky celkem k ničemu. [...]

  24. PeteCashmore says:

    Google ranks my memetrackers post first, hence Google wins. Simple. :)

  25. PeteCashmore says:

    Google ranks my memetrackers post first, hence Google wins. Simple. :)

  26. [...] Scoble finds another reason not to trust search engines. Remember: they’re businesses, just like every other .com or publishing company, not impartial public services only present to serve the greater good. This is something we’ve learnt from the Google / China debacle, but it doesn’t seem to have sunk in yet. [...]

  27. anonymous hero says:

    scoble,

    you are not an engineer (it appears), stop with this inane babble. it’s like fielding support calls from users – ‘why dont my intarweb work like i want to?!’. there’s nothing wrong with being non-technical, but i’d have thought that you would develop some self restraint to prevent embarrassing yourself in public.

    depending on the implementation, there is a big difference in cost between returning a result containing first X matches ordered by Y (eg relevancy) and returning the exactly correct number of matches. the latter may have a cost equivalent to actually finding *all* of the matches. generally, when this is the case and you don’t *have to* be absolutely precise, you do statistical extrapolation, which will by definition be imprecise. that is what you are seeing.

  28. anonymous hero says:

    scoble,

    you are not an engineer (it appears), stop with this inane babble. it’s like fielding support calls from users – ‘why dont my intarweb work like i want to?!’. there’s nothing wrong with being non-technical, but i’d have thought that you would develop some self restraint to prevent embarrassing yourself in public.

    depending on the implementation, there is a big difference in cost between returning a result containing first X matches ordered by Y (eg relevancy) and returning the exactly correct number of matches. the latter may have a cost equivalent to actually finding *all* of the matches. generally, when this is the case and you don’t *have to* be absolutely precise, you do statistical extrapolation, which will by definition be imprecise. that is what you are seeing.

  29. [...] Scoble’s been on a bit of a tear lately with his thesis that search engines lie. The implication being that Google, in particular, is intentionally inflating its numbers. What I found most disturbing was his perhaps light-hearted musing: Why aren’t there any truth in advertising laws for search engines? [...]

  30. Sam Dipiazza says:

    You need to make sure you have all results limiting preferences turned off. When I set Google to return results in any language, with safe search turned off and personalized results turned off, I get 25,270,000,000 hits for * * at http://www.google.com.

  31. Sam Dipiazza says:

    You need to make sure you have all results limiting preferences turned off. When I set Google to return results in any language, with safe search turned off and personalized results turned off, I get 25,270,000,000 hits for * * at http://www.google.com.

  32. [...] Scoble: Why do search engines lie? “Here, do a search for Memetrackers (Google, MSN, Yahoo). Now, why are none of their numbers accurate? Google says there are 713 results, but can only display 62. MSN says there are 101, but only can display 100. Yahoo says there are 368, but only can display 44.” January 31, 2006. Corporate Black Hat SEO Terminology – when is a cloaker not a cloaker? [...]

  33. Vinay says:

    You can only see (and verify) only first 1000 results.
    If you try to change the url (HTTP GET variables) like this

    http://www.google.com/search?q=Scobleizer&num=100&hl=en&lr=&start=10000&sa=N

    and try to get results over 1000, it says

    “Sorry, Google does not serve more than 1000 results for any query. (You asked for results starting from 10000.)”

    So, it actually doesn’t matter. I don’t even look at that number.

  34. Vinay says:

    You can only see (and verify) only first 1000 results.
    If you try to change the url (HTTP GET variables) like this

    http://www.google.com/search?q=Scobleizer&num=100&hl=en&lr=&start=10000&sa=N

    and try to get results over 1000, it says

    “Sorry, Google does not serve more than 1000 results for any query. (You asked for results starting from 10000.)”

    So, it actually doesn’t matter. I don’t even look at that number.

  35. Anonymous says:

    “42,000,000 results”

    These are referred to web pages ……so i dnt find any problem with the kind of results provided by google.

  36. gudipudi says:

    “42,000,000 results”

    These are referred to web pages ……so i dnt find any problem with the kind of results provided by google.

  37. AdSense Money Maker

    Do you know how to make money from AdSense automatically? You don’t!? I’ll teach you how!

  38. [...] and then i found this, a post on the same [...]

  39. toby33 says:

    Google is often to complex to understand, and that, is a problem. The results do not normally add up, and how do we know there are 44m sites. SEO Solutions are becoming a necessity for even small business which, is interesting – to say the least. With business focusing less time on their business and its customers and more time on understanding the complexity of SE's, something is going to have to break apart. We will see what it is.

  40. toby33 says:

    Google is often to complex to understand, and that, is a problem. The results do not normally add up, and how do we know there are 44m sites. SEO Solutions are becoming a necessity for even small business which, is interesting – to say the least. With business focusing less time on their business and its customers and more time on understanding the complexity of SE's, something is going to have to break apart. We will see what it is.