Real-time systems hurting long-term knowledge?

Whew, OK, now that I’m off of FriendFeed and Twitter I can start talking about what I learned while I was addicted to those systems.

One thing is that knowledge is suffering over there. See, here, it is easy to find old blogs. Just go to Google and search. What would you like me to find? Chinese Earthquake? Google has it.

Now, quick, find the first 20 tweets or FriendFeed items about the Chinese Earthquake. It’s impossible. I’m an advanced searcher and I can’t find them, even using the cool Twitter Search engine.

On April 19th, 2009 I asked about Mountain Bikes once on Twitter. Hundreds of people answered on both Twitter and FriendFeed. On Twitter? Try to bundle up all the answers and post them here in my comments. You can’t. They are effectively gone forever. All that knowledge is inaccessible. Yes, the FriendFeed thread remains, but it only contains answers that were done on FriendFeed and in that thread. There were others, but those other answers are now gone and can’t be found.

The other night Jeremiah Owyang told me that thought leaders should avoid spending a lot of time in Twitter or FriendFeed because that time will be mostly wasted. If you want to reach normal people, he argued, they know how to use Google.

And if you want to get into Google the best device — by far — is a blog. Yes, FriendFeed is pretty darn good too (it better be, it was started by a handful of superstars who left Google to start that company) but it isn’t as good as a blog and, Jeremiah argues, my thoughts were lost in the crowd most of the time anyway.

And that’s on FriendFeed which has a decent search engine (although it remains pretty darn incomplete. Here, try to find all items with the word Obama written in Washington DC on November 4th 2008. Oh, you can’t do that on FriendFeed and on Twitter search you can’t pull out the important ones and the location information is horribly inaccurate because it isn’t based on where a Tweet was done from, but from the tweeter’s home location).

Here’s an easy search: find the original Tweet of the guy who took the picture of the plane that fell into the Hudson. I can do it on FriendFeed after a few tries, but on Twitter Search? Give me a break. Over on Google? One click, but you gotta click through a blog or a journalistic report to get there. Real time search is horrid at saving our knowledge and making it accessible.

This is a HUGE opportunity for Facebook, which has more than 10x more users than Twitter and 100x more than FriendFeed.

Or, it’s how Google will get back into the social networking business and lock everyone out.

What do you say Larry Page?

About Robert Scoble

As Startup Liaison for Rackspace, the Open Cloud Computing Company, I travel the world with Rocky Barbanica looking for what's happening on the bleeding edge of technology and report that here.

64 thoughts on “Real-time systems hurting long-term knowledge?

  1. I never thought of Twitter and FriendFeed as knowledge bases. More static information like webpages, blogs, wiki's etc. are. Twitter and FriendFeed are a great portal and source to find out opinions and what people are talking about, but in an ideal situation after having talked about a subject for a bit, someone should summarize the conversation or research the matter and turn it into actual archivable re-findable knowledge.

    That's why we still have school books, right? Twitter and FriendFeed are the teachers during class.

  2. Books and newspapers still work for me. I'd rather spend time with a good article than read the excerpt on Twitter or FF.

  3. Excellent points, however there is more to this story :)

    The problems you mention are very true and due IMO to the “sliding window” nature of current real-time search. By that I mean the current search window is some fixed interval that keeps sliding forward and dropping off old entries. The end result is that real-time search is at present good only for things happening RIGHT NOW and even relatively recent stuff is quickly forgotten.

    For mathematically inclined, one can say

    Real_Time_Search(t) = dSearch(t)/dt

    where dt is effectively the sliding window over which we are searching.

    There is no real reason to do it this way, apart from expediency of initial implementations, perhaps. Real-time search should NOT be forgetting older entries. It should be definitely taking freshness as a MAJOR ranking signal but definitely keeping old stuff.

    This way, assuming it is implemented well, real-time search should, with time, start converging to general search ala Google. That is actually the flip side of the above equation.

  4. Quality article Robert. Perhaps real-time integration into a blog is the best solution to the problem. I have a brand-new blog into which I have inserted an 'Asides' category where I can post quick, Twitter-like updates without writing an entire post. This option is OK, but it's not real-time like FriendFeed from a comment/feedback standpoint.

    It would be great to have sort of a FriendFeed-in-your-sidebar approach that you could access via your CMS. This would allow you to interact in real-time on your own blog while the data could be stored and archived in your own mySQL database. Yes, I'm picking apart the idea in my own head as I write, but the general idea is there.

  5. Remember the meaning of the word “twitter”. Human communication is not meant for eternity. So you can´t find it later via search. It´s a bit like in the past. We lived even without search…

  6. Remember the meaning of the word “twitter”. Human communication is not meant for eternity. So you can´t find it later via search. It´s a bit like in the past. We lived even without search…

  7. Another method to be found by Google is to post well named Flickr and Youtube submissions. Check out Google search results for “safari firebug”, “reuters app”, “steampunk maker faire”, “angora ridge fire”, … In each case a Flickr photo or YouTube video is in the top five results.

  8. Scoble – I agree. I originally built Cullect.com to parse Twitter feeds, but after a year, I've found those messages so ephemeral as to have no long term value. Since Cullect was designed to find 'importance' in feeds over the long term it quickly becomes clear how much noise these services generate.

  9. Why not just combine authoritative sources like Google with Twitter (using Twitter as a way to attach optional commentary or for ranking content better) like was demonstrated with tweetnews which uses Yahoo! BOSS and Twitter Search

    http://tweetnews.me

    I really like this model and am really surprised this hasn't become the norm or taken over search.twitter.com
    I just don't see my mom or dad searching Twitter directly as the results are noisy and unreliable
    But with this model it adjusts to their normal setting.

  10. Jesse, the Google search on your Tweet Stream is a very poor substitute, as it still has more holes than a Swiss cheese. So far I've found only piping of (single user) Twitter RSS feeds into Tumblr to be a reasonably simple/sound solution as far as archiving is concerned (tweet frequency can't be too high, else Tumblr will shut it down).

    Of course that won't solve Robert's wish for Time-bound snap-shots that can be easily resurfaced: Google itself, even with the recent “Recency Operator” improvements (Last 24hrs/week/month/year), still has a way to go before you can say “give me date range from: … until: …”

    Sound to me like there are plenty of business opportunities around providing archiving abilities around given real-time queries. I.e. Robert can find what he wants on a given day or recent sequence of days SHORTLY AFTER the event, and then says: “Hoover up” all of this info and archive it for later… e.g. for the recent “140conf” or “140tc” etc.

    That implies taking timely action because Twitter seems in no mood to let you back-search past about 30 days (at best, often it's only about 7 days anymore during daytime loads). They may of course already be selling access to the full range to corp. researchers for a lot of money.

  11. OK, cool. I wasn't singling out Ourdoings, for instance. It looks great. I was just trying to make a general point to Robert that blogs maybe have changed and morphed, but remain a centralized hub of one's net identity. Actually, services like yours only help blogs by giving them more reach and connectivity with interesting products like yours. :-) I just thought a couple days ago he felt that blogs didn't matter anymore and to hinge everything one is doing kind of out there on the branches, when I felt the main trunk was still key. Just a philosophical thought. Thanks for your info though!

  12. Robert,
    There is an old saying “Don't put all your eggs in one basket.”

    With all things human, these pearls of wisdom can be broadly applied. As with our virtual interactions with each other we must balance the use of all technologies.

    As you have documented so many times, instant communications such as Twitter and Friendfeed have great advantages. But as you bring up there are restrictions or draw-backs of these technologies.

    Could you imagine trying to catalog, quantify and qualify just your own Tweet/Friendfeed “though channel” over a period of 10 years?

    Just the massive amount of data would be overwhelming.

    So, I guess if there are thoughts you want to preserve for historic reference you might want to sit down and give those thoughts the time and effort they deserve to be preserved.

    Although chatter may lead to knowledge and action as demonstrated with the recent Irainian elections, when you have a lot of chatter you have a lot of noise as well and the good thoughts most likely will get drowned out and not archived very well.

    One problem with our “gotta get it now” and “24 hour news cycle” is that people really don't take the long view of humanity and knowledge. Ask yourself this… How will you ensure that your best thoughts and actions will be remembered in 100 years?

  13. This is something that I have been thinking about recently, so the coincidence control center is working over time.

    The twitter knowledge, the blog knowledge it is all very cumbersome. Blogs in particular haven't really changes since um early 80's. Its the same thing, dial in on my dads 9600 baud modem to the local bulletin board, yeah maybe you had to wait 30 minutes. Read the message type a message.

    How far back can we retrieve those conversations? Now I would bet, that everything that has ever been twitted is stored some where. Could be wrong, but it would not make sense for twitter to destroy their content. It would kind of being like burning money.

    Back to basic blogging and such, the problem is that the original poster says something. People comment and yes if you scroll through and read them you can parse some Knowledge from them. But why doesnt someone have a closing space on a blog. The keypoint is this the thought leaders post and let loose and move on to the next, it is so fleeting. If they were truly interested they would close the gate once they opened it. They would analyze the comment contents and they would deliver the meaning of what they learned. How they changed from the shared distributed knowledge.

    That is the creative process, but blogging is mostly about posting something, letting the trolls take it and moving on to the next. yeah it is retrievable but is it usable?

  14. You are spot on. By publishing all this content of yours on a blog it will stay there forever and people will easily be able to find it in your archives via Google… you can still have your discussions / conversations on FriendFeed / Twitter but always make sure that it all starts with something you post on your blog.

  15. Lets take your mountain bike example: assume that the twitter search problem is solved, so you can retrieve every relevant tweet in addition to every friendfeed response. Now what do you have? Hundreds of suggestions, probably clumping into a small number of popular choices and a large number of outliers. So now what?

    Go buy the bike with the most suggestions? That will get you the most popular bike, but not necessarily the best one for you. This happens with camera recommendations all the time: a prosumer DSLR is not the best choice for many questioners, but its what the camera enthusiasts use.

    Investigate every suggested model? Possible I suppose, but not very practical.

    Facebook can improve this situation somewhat, in that your social graph should eventually allow you to express whose opinion you trust more than others. So you'd have a weighting based on who made the suggestion. Nonetheless this is still imperfect, it isn't realistic to have a single trust metric for every possible subject.

    What we lack is a measure of authoritativeness of the _answer_: both an inherent quality of the person plus an indication of how sure they are of the answer they are giving. Suggesting a bike based on their own experience of test-riding a dozen models six months ago is far more authoritative than suggesting the cool bike they saw on University Avenue yesterday. There is also a timeliness function: test-riding bikes 6 years ago is less useful than 6 months.

    How to generate this trust metric? Requiring people to do anything manually, to rate the strength of their answer, is doomed to fail. It is both annoying and easy to game. Measuring how many recent events in their lifestream are relevant to the question might be one way. If they mention their timing to bike to work each morning, they probably have much more interest in and knowledge of bicycles than most.

    Then again, the person who bikes every day isn't going to be the best source if you're only planning to use it twice a month. They're likely to suggest “too much bike”. You can still see how access to the lifestream would help though, by gauging how closely their activity in this area matches yours.

  16. OurDoings won't go out of business because it hasn't gone into business yet. I developed it and continue to build it as a side project. Also, the comments live on FriendFeed and Disqus, just like on this blog.

  17. Twitter is for short-term chatting, blogs are for long-term knowledge.

    If you are “part of the conversation” in Twitter, you MAY learn something, if not it is more or less gone, like any other conversation anywhere.

    If you want to share knowledge with people, Twitter is probably the worst tool. If you want to inform people about the latest happenings in your life, or ask a “random” bunch of people a question, Twitter MAY be useful.

  18. This is the problem that most people who use Twitter for information exchange have been saying from the beginning – and it applies to all communes (Twitter is a commune, not a collective or community precisely for the problems you mention — lack of context, lack of knowledge, many voices without structure).

    Granted, there are tools that will make Twitter search more useful, but we still have the issue of context and historically structure.

    I would say that with the new search tools released by Twitter (timeline, wonder-wheel, context search) we are getting closer to where we should be. Not near it, but closer. Alas, the problem is that the next step (a truly semantic web) is too far for most people to grasp today — and thus, Twitter and friends continue to gain adopters.

  19. Good idea to search for analogies to good old real world… To me comparism with a university department is the most fruitful thing because, unlike most firms, it is there for producing, publishing and retrieving knowledge.

    Twitter, Friendfeed etc. are like chatting and discussing at the office, like doing workshops, but with everyone listening who would like to. Blogs are like putting together what you've learned with what you've made up by your own and pinning it down.

    Everyone who has ever written a paper for publication or a serious blog posting knows that it is a big difference to think that you're into it after chatting, and to really pinning it down in a well structured piece of text. So, I guess, blogging it is more than just making knowledge retrievable. It is making it explicit and shining in the first place.

    So, Scobleizer, please stick to it!

  20. This another reason I argued for blogs the other night. All that energy down a black hole. The blog is the ultimate hub of your personality whether its commercialized or not. What happens if Ourdoings goes out of business for instance? Sure you have the photos saved, but the comments, the context? I'm not saying stay off those services. They are great, but they need to somehow tie into one's blog and be searchable from one's blog.
    Every problem is an opportunity for someone else. One could bring the most interesting tweets or most important of them, into one's blog and tag them so they are searchable not only for themselves, but also on Google and get some google juice to boot.

  21. Robert,

    Welcome back.

    You are validating my “Noise to Knowledge” idea which I noticed in the mid 1990's as we started to get computing system ubiquitously available to people. New technology was arriving (email, internet, word processing, spreadsheets) and old technology was being gradually usurped (newspapers, magazines, books, libraries, published standards, etc.).

    The idea is that information goes through three steps: Noise to Best Practice to Knowledge. Further there is flow between all three states. In the old days before computers we learned how to manage flows from Noise to Knowledge. For example, we were brainstorming near the water cooler and thought up some neat ideas (Noise). After returning to desk we tried out the best (or those we could remember) and say, “Yea, this works”. That becomes a Best Practice. We start telling others and it gets to be the way everyone does it. Knowledge. Eventually, the Knowledge needs to be refreshed and we go back to brainstorming for new ideas or look to current Best Practices to change knowledge.

    Protocols for doing that before computing were understood, in general. After having computers, we evolved into becoming tool-centric. For example, email became (and maybe still is) not only our source of noise (daily mail) but also our source of best practice and knowledge (people individually storing old mail in folders for reference). This is bad as it's not scalable. But if a good email program is all you have, e.g. Outlook, then that's what you're going to use.

    I think the trick is to have great tools fit for purpose and tools for Noise are not the tools for Knowledge. Twitter is Noise. Google searching is probably a lot of noise, but Best Practice can easily be found. Knowledge via Google can be discerned but not easy and there will be many versions of the truth.

    I wrote all this up into a paper at the time for discussion with the IT Department to demonstrate why the tools they gave us were inadequate and will be mis-used. They were. And the IT Dept didn't change their direction. At the time intranet web servers (Apache) were very suspect and declared “non-strategic”, as was the Internet (!). Oh well.

    I re-wrote the paper last year and posted on my blog.

  22. Seems to me that blogs should be more like a Friendfeed instance that I control, with support for real time comment updates. Why should good design be siloed into proprietary platforms?

  23. Neither the real-time web nor the traditional web (Google in this case) will replace one or the other. They will both be dependent on each other.

  24. Robert,

    I listen to music on Pandora but buy it on iTunes if I want to be able to listen to it at my leisure. Can't the real time internet and the regular internet co-exist in the same way?

    -chris

  25. What you are describing is the challenge that knowledge based industries have been struggling with forever. How do you capture email threads, short term discussions and distill it into a long term knowledge that can benefit the larger organization. Accenture has a full time staff to do distill and archive long term knowledge for each project. Are blogs the corporate equivalent of an archival department?

  26. True. I've been frustrated by this in the past, but I never thought of it as a serious problem–just as my lack of searching ability. It's true, though–tweets are like flashes of light that may reveal important information for a moment, but then disappear forever. Same with FriendFeed. As much as I love the real-time web, I can't handle it for long because I reach information overload pretty fast. Blogs aren't going anywhere.

  27. Interesting, but has your knowledge of these events been destroyed, or maimed. The knowledge that has been created is a kind of tacit knowledge that has always been difficult to say right. Philosophers have expounded on this forever. “The way that can be told is not the way.” So on and so forth, Descartes, Locke, Hegel Kant……

    I think the tacit knowledge that you have gained using real time has been converted into explicit through slow time blogging, daily actions.

    I say real time conversations are more like oral conversations. Which Knowledge does not have to be written to be maintained and passed on. Ie. Navajo language wasn't written until 50 years ago. But they still have lots of knowledge from long ago.

    It is easy to get to the conversational real time knowledge, doesn't sound like it and I am glad that you have pointed this out. seems like a business opportunity for some one who can thread archive and than provide search at a later time……

  28. Heh, interesting to see more responses on FF than here ;)

    While I agree with you about the poor search functionality available to Twitter (not being a user of FF services) – that you cannot filter your searches to dates, themes, persons etc let alone use of locational information (which tbh is more limited by the means with which people use either service) that is not to say that it's a problem with RT services.

    Should there be an improvement from the assumption of time-series based data (which is what the services are) away from sorting descending by date posted then there may be some improvements there. The fact that Twitter (and FF?) offers private datastreams complicates the matter greatly (eg: how many facebook indices can google search when the data is kept completely behind the garden wall?).

    I think there's a lot more that could be done – flickr's current search method options is a good example – and with time-series data that is immense in size and value (but with little “rank” or “peer review” and linking obfuscated through the use of shortened urls) the value of those searches (both generally for APIs and visualisation of trends and for specific users where private streams exist) would prove a great benefit to the company.

  29. Google's far from out of this game, for sure. I use it to find all my past Tweets on Twitter, for example. The site:twitter.com/Jesse is very useful for that. Twitter search is lacking. Hopefully Facebook is taking their time right now in order to do it right.

  30. You say: “now that I’m off of FriendFeed and Twitter I can start talking about what I learned while I was addicted to those systems”

    But that insinuates that you logged out of those and are no longer addicted. I would say both are false.

    Following the gist of your argument, it is well established that Twitter's search is very broken. I hear rumblings about them improving, and maybe they will go back and get it all, but I have concerns they may never get it. There is a place for both real-time and long-time discussions being made available where they best belong.

Comments are closed.