Why do search engines lie?

Here, do a search for Memetrackers (Google, MSN, Yahoo). Now, why are none of their numbers accurate? Google says there are 713 results, but can only display 62. MSN says there are 101, but only can display 100. Yahoo says there are 368, but only can display 44.

Why aren’t there any truth in advertising laws for search engines?

Update: the numbers are changing. Google now says there are 699 results, but can only display 692 (this is after you tell it to display all duplicates).

Oh, and no engine can display more than about 1,000 results, so if they say there’s 42,000,000 results there’s no way to verify whether those numbers are accurate or not.

Update 2: Yahoo is actually accurate once you tell it to display all duplicates. It says 429 results and displays 429 results. So, Yahoo wins! (although I wish they’d all be a little clearer up front).

90 thoughts on “Why do search engines lie?

  1. Google is often to complex to understand, and that, is a problem. The results do not normally add up, and how do we know there are 44m sites. SEO Solutions are becoming a necessity for even small business which, is interesting – to say the least. With business focusing less time on their business and its customers and more time on understanding the complexity of SE's, something is going to have to break apart. We will see what it is.

  2. Google is often to complex to understand, and that, is a problem. The results do not normally add up, and how do we know there are 44m sites. SEO Solutions are becoming a necessity for even small business which, is interesting – to say the least. With business focusing less time on their business and its customers and more time on understanding the complexity of SE's, something is going to have to break apart. We will see what it is.

  3. “42,000,000 results”

    These are referred to web pages ……so i dnt find any problem with the kind of results provided by google.

  4. “42,000,000 results”

    These are referred to web pages ……so i dnt find any problem with the kind of results provided by google.

  5. You need to make sure you have all results limiting preferences turned off. When I set Google to return results in any language, with safe search turned off and personalized results turned off, I get 25,270,000,000 hits for * * at http://www.google.com.

  6. You need to make sure you have all results limiting preferences turned off. When I set Google to return results in any language, with safe search turned off and personalized results turned off, I get 25,270,000,000 hits for * * at http://www.google.com.

  7. scoble,

    you are not an engineer (it appears), stop with this inane babble. it’s like fielding support calls from users – ‘why dont my intarweb work like i want to?!’. there’s nothing wrong with being non-technical, but i’d have thought that you would develop some self restraint to prevent embarrassing yourself in public.

    depending on the implementation, there is a big difference in cost between returning a result containing first X matches ordered by Y (eg relevancy) and returning the exactly correct number of matches. the latter may have a cost equivalent to actually finding *all* of the matches. generally, when this is the case and you don’t *have to* be absolutely precise, you do statistical extrapolation, which will by definition be imprecise. that is what you are seeing.

  8. scoble,

    you are not an engineer (it appears), stop with this inane babble. it’s like fielding support calls from users – ‘why dont my intarweb work like i want to?!’. there’s nothing wrong with being non-technical, but i’d have thought that you would develop some self restraint to prevent embarrassing yourself in public.

    depending on the implementation, there is a big difference in cost between returning a result containing first X matches ordered by Y (eg relevancy) and returning the exactly correct number of matches. the latter may have a cost equivalent to actually finding *all* of the matches. generally, when this is the case and you don’t *have to* be absolutely precise, you do statistical extrapolation, which will by definition be imprecise. that is what you are seeing.

  9. Kinda depends on the goal of the search.

    It would make no sense for the search on ebay or amazon to show you something they can’t sell you – so they show you something kind of similar that lots of people buy.

    Its not hard to argue that this is better than vanilla dumb search.

  10. Kinda depends on the goal of the search.

    It would make no sense for the search on ebay or amazon to show you something they can’t sell you – so they show you something kind of similar that lots of people buy.

    Its not hard to argue that this is better than vanilla dumb search.

  11. I think I just got dumber reading this blog entry. Who cares about results past the first couple of pages?

  12. Google
    Results 1 – 10 of about 192 for memetrackers. (0.06 seconds)

    As stated above its all about which datacenter you’re at. You can compare results between teh different datacenters here

  13. Google
    Results 1 – 10 of about 192 for memetrackers. (0.06 seconds)

    As stated above its all about which datacenter you’re at. You can compare results between teh different datacenters here

  14. To all who answered my “* *” wildcard query post: Yes, I buy the BigDaddy explanation. But.. that still begs the question: Why do I only see 8-11 billion results in the U.S.? I’ve done this wildcard query from home, from work, from the east coast, from the west coast. Never in the past 8-9 months since I started doing the query have I even come across even half of the 25 billion web pages suppposedly in the Google index. Why does the UK get to see 18 billion? Why does Brasil get to see all 25 billion?

    Some of you above have said that size does not matter, that nobody cares about the long tail, so it doesn’t matter if Google’s index only shows me 8 billion or 18 billion or whatever. Well, I could debate that, but I don’t want to go too far off topic. Let’s just say that, even if that long-tail argument is true, for any one query (i.e. nobody scrolls past 30, much less 10 documents, anyway), it is probably not true, for the entire set of possible queries.

    What I mean is that for your query, you might never need to look at document #10,575. But for somebody else’s differently-worded query, that same document will be ranked 3rd. And if that document is not in the index, because Google U.S. is only showing us 8 billion of the 25 billion web pages, then for this latter query, someone is not getting the information they need.

    So there are two long tails here: (1) is the long tail for one query. (2) is the long tail of all possible queries. I would argue that (2) is much more important, if not vital. By only showing 8+ billion pages in the U.S., Google is robbing us of the top-10 results to all those queries.

    The final problem with this whole “non-truth in advertising” thing is that, when I do the wildcard search, is there no notice at the bottom of the page saying “Because of the DMCA, pages have been removed from your search”? I mean, because, after all, if my wildcard search really is returning all 8 billion pages from the non-bigdaddy server, then some of those will have already been removed, right? So Google has removed pages, and has not told me that it has removed them.

    That’s a big problem. That’s a big trust issue.

  15. To all who answered my “* *” wildcard query post: Yes, I buy the BigDaddy explanation. But.. that still begs the question: Why do I only see 8-11 billion results in the U.S.? I’ve done this wildcard query from home, from work, from the east coast, from the west coast. Never in the past 8-9 months since I started doing the query have I even come across even half of the 25 billion web pages suppposedly in the Google index. Why does the UK get to see 18 billion? Why does Brasil get to see all 25 billion?

    Some of you above have said that size does not matter, that nobody cares about the long tail, so it doesn’t matter if Google’s index only shows me 8 billion or 18 billion or whatever. Well, I could debate that, but I don’t want to go too far off topic. Let’s just say that, even if that long-tail argument is true, for any one query (i.e. nobody scrolls past 30, much less 10 documents, anyway), it is probably not true, for the entire set of possible queries.

    What I mean is that for your query, you might never need to look at document #10,575. But for somebody else’s differently-worded query, that same document will be ranked 3rd. And if that document is not in the index, because Google U.S. is only showing us 8 billion of the 25 billion web pages, then for this latter query, someone is not getting the information they need.

    So there are two long tails here: (1) is the long tail for one query. (2) is the long tail of all possible queries. I would argue that (2) is much more important, if not vital. By only showing 8+ billion pages in the U.S., Google is robbing us of the top-10 results to all those queries.

    The final problem with this whole “non-truth in advertising” thing is that, when I do the wildcard search, is there no notice at the bottom of the page saying “Because of the DMCA, pages have been removed from your search”? I mean, because, after all, if my wildcard search really is returning all 8 billion pages from the non-bigdaddy server, then some of those will have already been removed, right? So Google has removed pages, and has not told me that it has removed them.

    That’s a big problem. That’s a big trust issue.

  16. And that’s why AltaVista died – number of results is meaningless. It’s the quality of the first page of results that ultimately matters. That’s why Google is on top of search right now – because a year ago, their first page of results was vastly better than Yahoo, MSN, etc. Even though the other guys have largely caught up, Google got there early enough to establish themselves as “the” search provider.

  17. And that’s why AltaVista died – number of results is meaningless. It’s the quality of the first page of results that ultimately matters. That’s why Google is on top of search right now – because a year ago, their first page of results was vastly better than Yahoo, MSN, etc. Even though the other guys have largely caught up, Google got there early enough to establish themselves as “the” search provider.

  18. The maior problem of searching is that the result that I look for is often not on the first page! From user viewpoint, why would that matter if a search engine returns 1 million or 1 billion results?

  19. The maior problem of searching is that the result that I look for is often not on the first page! From user viewpoint, why would that matter if a search engine returns 1 million or 1 billion results?

  20. @JG: >Could it be that Google has internet filters not only on China, but on the US, too? Why is it that the UK has twice the index size?

    Maybe G UK has all those Scientology pages it made Google remove in the US. :)

  21. @JG: >Could it be that Google has internet filters not only on China, but on the US, too? Why is it that the UK has twice the index size?

    Maybe G UK has all those Scientology pages it made Google remove in the US. :)

Comments are closed.