Wednesday, December 12, 2007
Simply click on the link below to install the WikiDashboard search engine plug-in for your Firefox browsers. Use at your own risk!
Click here to install the WikiDashboard Search Engine Plugin
Tuesday, December 11, 2007
So tonight, while I was sitting in the BayCHI talk on Facebook, I hacked up a little bookmarklet here.
Buttons (bookmarklets) are links you add to your browser's Bookmarks Toolbar.
What do they do?
The button allows you to quickly jump to the WikiDashboard page of the Wikipedia page you're on.
How do I get them?
To install the bookmarklet, simply drag the link above to your toolbar.
To use, simply click on the bookmark you just made when you're on a Wikipedia page (currently supports only English pages), and it will bring you to the exact same page on WikiDashboard (but, obviously, now augmented with the social dashboard visualization). If we can't find the same page on WikiDashboard, then you will be brought to the home page of WikiDashboard site.
Sunday, November 25, 2007
So what is Dunbar's number? Well, it's 150. That's the theoretical limit of the number of people that you can “know socially” in the sense that you know them as individuals and know something about their relations to one another (and you), and the reason why it's an interesting number is that it seems to be related to the size of that melon sitting on your shoulders.
As an undergraduate I joint-majored in anthropology and psychology, so I'm generally interested in brains, but even more so because it has been a bit of a mystery as to why ours (homo sapiens) got so big. Second only to the heart, the brain consumes a massive amount of our energy intake, and those big heads make childbirth more problematic than our primate relatives. Most people probably would tend to believe that bigger brains mean greater intellectual capacity, and that somehow that provides a a substantial increase in out ability to, say, forage, to offset the costs of big brains. But as Dunbar notes in his Science article, there is no reason for a chimp to have a brain so much bigger than squirrel when they basically solve the same foraging problems.
It turns out that neocortext ratio for various species is strongly correlated with the average size of the social group for members of that species (and for humans that number is 150 at the limit), Recent evidence suggests that it is more specifically correlated with pairbonding. But even more importantly, it appears that increasing sociality increases reproductive success. So social cognition increases fitness.
This frames all sorts of interesting questions about social technologies. James Surowieki's “The Wisdom of Crowds” argues that one can get more accurate and “wiser” judgments from large-scale aggregate behavior in things like electronic markets or voting systems. But if was just the size of the herd that mattered then cows would be wizards. The claims made by the comparative biologists studying brain size is that our ability to maintain awareness and reason about complex social relations buys us something important. So, assuming that things like Twitter and Facebook (or the Wikidashboard) and the rest give us greater social awareness and reasoning--what exactly does it buy us?
Monday, November 12, 2007
Raluca Budiu, who is a post-doc working in our group, has conducted some very interesting research with us on how tagging appears to affect human information processing. She studied two techniques for producing tags: (1) the traditional type-to-tag interface of typing keywords into a free-form textbox after reading a passage or article; (2) a PARC-developed click2tag interface that allows users to click on keywords in the paragraph to tag the content.
The experiment consisted of 20 subjects and 24 passages in a within-subject design. Participants had to first study passages and tag them, and then they performed memory tests on what they had actually read and tagged. The memory tasks were that, after tagging the content, they have to either (a) freely recall and type as many facts from the passages as possible; or (b) answer 6 true/false sentences in a recognition task.
As reported in the paper, the results suggest that:
- In the type-to-tag condition, users appears to elaborate what they have just read, and re-encoded the knowledge with keywords that might be helpful for later use. This appears to help the free-recall task (a) above. In other words, users seem to end up with a top-down process and induces them to schematize what they have learned.
- While in the click2tag condition, users appears to re-read the passages to pick out keywords from the sentences, and this appears to help them in their recognition tasks (b) above. In other words, users seem to use a bottom-up process that simply picked out the most important keywords from the passage.
Click here to download the technical report and pre-print (the highlights in the paper are mine).
Monday, November 5, 2007
The first week we will have Ross Mayfield from SocialText speaking to us about Wikis in the Enterprise. Following that, we have speakers from various places ranging from startups, industrial research labs, and academia. Topics will range from social practices of online communities, startup excitements, mashup techniques, and academic studies. The talks will be recorded and published on the web. Here is the preliminary announcement:
PARC Forum -- special speaker series on going beyond web 2.0
Thursdays at 4 pm, Palo Alto, CA
more info at www.parc.com/forum
Upcoming confirmed speakers:
*November 15 -- Ross Mayfield, SocialText
*November 29 -- Garrett Camp, Stumble Upon
*December 6 -- Charlene Li, Forrester Research
*December 13 -- Guy Kawasaki, Truemors, Garage Ventures
*January 10 -- Bernardo Huberman, HP Labs
*January 17 -- Chris Anderson, Long Tail
*February 7 -- Premal Shah, Kiva.org
*February 21 -- Andrew Mc Afee, Harvard Business School
*March 20 -- Lisa Petrides, Amee Evans; OER Commons
*March 27 -- Ed Chi, PARC Augmented Social Cognition
To subscribe to future PARC Forum announcements and/or our e-newsletter,
please visit: www.parc.com/subscriptions.
Monday, October 29, 2007
'Collaborate', according to the American Heritage Dictionary, is "to work together, especially in a joint intellectual effort." The problem is that tagging features in many of the popular Web2.0 tools such as Flickr and YouTube are not really 'collaborative', since users aren't really working together per se. In YouTube, for example, only the uploader of the original video clip can specify and edit the tags for an video. Most of the time, in Flickr, one only tag their own photos. However, Flickr is somewhat more collaborative than YouTube because the default setting for any account is to allow contacts such as friends and families to also tag the photos.
Both of these two systems don't seem that 'collaborative', because, to me, collaboration implies shared artifact, shared workspace, and shared work. On the other hand, 'social' is "living or disposed to live in companionship with others or in a community, rather than in isolation". In other words, simply existing and having some relation to others in a community. So for example, I would argue that in YouTube, we have social tagging but not collaborative tagging, because while users tag their uploaded videos in the context of a online social community, and they do not collaborate to converge on a set of tags appropriate for that video.
The use of the term 'collaborative' in past Computer-Supported Cooperative Work (CSCW) field has especially come to imply
a shared workspace. With shared workspaces, often there are some elements of coordination and conflicts involved as well (and hopefully conflict resolution as well). So in contrast to YouTube, the most 'collaborative' tagging system I know is the category tagging system in Wikipedia. Anyone can edit the category tags for an article. They can remove, add, discuss, and revert the use of any tag. In this case, the category tags are shared artifacts that anyone can edit inside a shared workspace. The work of tagging all 2 Million+ articles in Wikipedia is shared work among the community.
It's perhaps interesting to note that somewhere in between YouTube and Wikipedia tagging is perhaps the bookmarking system del.icio.us. In del.icio.us, there is a shared artifact (the tagged sites or URLs), and there is shared work of tagging all of the websites and pages out there on the Web. However, there is less of a notion of a shared workspace. My tags for an URL could be and probably is different from someone else's tags for the same URL. I also have the capability of searching within just my own del.icio.us space. So from least collaborative to the most collaborative, we have YouTube, then del.icio.us, and then finally the category tagging system in Wikipedia.
A simple way to explain this is that one must be social in order to collaborate, but one need not be collaborative to be social. So in summary, I would argue that social tagging is a superset of collaborative tagging. But a social tagging system may not necessarily be a collaborative tagging system. We should change the definitions in Wikipedia to distinguish between these two types of systems.
Sunday, October 28, 2007
Why Social Information Foraging?
For more than a decade, researchers at PARC have studied information foraging and sensemaking at the level of the individual user., which has had some degree of influence on practice. Now, the Augmented Social Cognition Area (ASC) at PARC is pushing that research to social information foraging and sensemaking with a special focus on the Web. There are many reasons for this, including
- Recent catastrophic failures in decision-making attributed to a lack of cooperation in collecting and making sense of information. For instance the Senate 9/11 report, and the NASA Columbia report both focus on poor cooperation in finding, sharing, and making sense of infomration
- Virtually all significant discoveries, inventions, and innovations are the result of collective activity that depend on standing on the shoulders of others.
- Recent gushing about "wisdom of the crowds" and similar phenomena points to the power of cooperative processes, but things can go wrong too.
- The Web is emerging (for better or worse) as the primary source of scientific and technical information n both professional and everyday life. The Pew Internet & American Life Project reports that the majority of online users turn first to the Internet for information about specific scientific topics, 87% of online users use the Internet as a research tool, and 62% use the Internet to check the reliability of facts.
There is a whole scientific literature about scientific literatures. Scientific literatures are interesting because they they are examples of community networks of peers producing content that date back to at least the 18th century. Pamela Sandstrom, an anthropologist who also works in the library sciences, did a study of the information foraging behavior of scholars in behavioral ecology (a subfield of biology). Dr. Sandstrom used a variety of ethnographic and bibliometric techniques to try to get at scholarly information seeking.
One emergent pattern was the way that
individual scholars arranged themselves to be information brokersThat is, they all contributed to their "core" field (behavioral ecology) but also maintained connections to peripheral fields (e.g., mathematics, population theory, psychology, etc.). Individuals could be viewed as "brokers" of information from peripheral fields (e.g., a new mathematical technique) and the core field (e.g., application of a new mathematical technique to modeling behavior).
Another emergent pattern was that individual scholars had different foraging strategies:
- Peripheral fields involve solitary foraging: 48% of the information resources used to write papers came from solitary deliberate search, information monitoring, browsing, or reading, and 61% of those resources were relevant to the periphery
- Core field involves social foraging: 30% of resources come from colleagues at distributing or communicating information through informal channels (e-mail; pre-publications; face-to-face recommendations, etc), and 69% of those resources are relevant to the core.
One of the big influences on our thinking in ASC is the work on Structural Hole Theory by Ronald S. Burt. The theory offers some insight about why the scholars discussed above might be motivated to arrange themselves to be brokers across different areas.
Burt's work is built around the analysis of social networks--network representations in which nodes represent people and links among nodes represent social relations, especially ones in which information might be communicated. Such networks tend to have clumpy arrangement. Clusters of people tend to interact with one another and less so with other clusters. The gaps between such clusters aree what Burt calls structural holes. Certain individuals can be identified as brokers or bridges acrss structure holes because they tend to have links that go from one tight cluster of people to another tight cluster (there is a specific network-based measurement called "network constraint" that does this mathematically).
Here's a summary of Burt's hypothesis about brokers and structural holes:
- There is greater homogeneity within than between social groups
- People whose social networks bridge the structural holes between groups have earlier access to a broader diversity of information
- People whose networks bridge the structural holes between groups have an advantage n detecting and developing rewarding opportunities
- Like an over-the-horizon radar in an airplane, brokerage across the structural holes between groups provides a vision of options otherwise unseen
- Idea value increased to the degree that individual were measured as social brokers
- The salaries of individuals increased to the degree that they were measured as social brokers (factoring out such effects as job rank, role, location, age, education, business unit, and location).
- Managers who discussed issues with other managers were better paid, more likely to be evaluated positively, and more likely to be promoted.
Our relations and communications with others can be represented as a social network. Specific content flows and dissipates through these networks. In both science and business it looks like certain "brokerage" position are source of discovery and innovation--places where specific individuals get exposed to a greater diversity of ideas, and ideas that may yet be unseen by others in a core group.
More generally, this research shows that it is possible to find things about social information flows that can be specifically related to better information foraging and sense making.
Thursday, October 11, 2007
Research meets Web2.0: Augmented Social Cognition sheds light on Coordination, Trust, Wikipedia, and Social Tagging
Over the last few years, we've realized that many of the information environments are gradually turning people into social foragers and sharers. People spend much time in communities, and they are using these communities to share information with others, to communicate, to commiserate, and to establish bonds. This is the "Social Web". While not all is new, this style of enhanced collaboration is having an impact on people’s online lives, so we've formed a new research area here at PARC to go after these ideas in depth.
“Augmented Social Cognition” area is trying to understand the enhancement of a group of people’s ability to remember, think, and reason. This has been taking in the form of many Web2.0 systems like social networking sites, social tagging systems, blogs, and Wikis. In this talk, I will summarize examples of recent research on:
- how decreasing the interaction costs might change the number of people who participate in social tagging systems?
- how conflict and coordination have played out in Wikipedia?
- how social transparency might affect reader trust in Wikipedia?
Wednesday, October 10, 2007
These and other insights will be presented at the conference KM World & Intranets 2007 (November 6-8 in San Jose), by Ed Chi, PhD (manager of PARC's augmented social cognition research area) and Lawrence Lee, director of business development. If you're attending the conference, please visit PARC booth #313 — or if you're interested in attending, e-mail firstname.lastname@example.org for a free expo pass and conference discount code.
Here is the conference website: KM World and Intranet 2007
Friday, October 5, 2007
What’s amazing about this as a research area is that it starts to touch on deep classic philosophic questions like: What do we know about authority? What does it mean? Where does authority come from? What makes someone trust you? When you ask a question about the quality of any information, you have to answer these questions. Who is the person who wrote it? Why should I trust that person? Just because Encyclopedia Britannica hires a bunch of experts to write for them, why should I believe them? What makes them an authoritative figure on how bees build their beehives? What is it about their authority, just because they’re attached to some higher education institution, that makes you want to believe them more than someone else?
When the Augmented Social Cognition research group tried to answer these questions, we ended up with an internal debate about what we mean by “quality.” And I think we come up with a model for understanding quality. We realized that, in academia, much of authority and the assignment of trust actually comes from transparency. Why should I believe in calculus? Well, because the mathematics is built on a foundation of axioms and rule sets that you can follow, which you can look up and examine. You trust calculus because there is a transparency built into the system. You can come to your own conclusion about the quality of the information based upon an examination of the facts. This is the scientific method!
What’s interesting is that exactly the same argument is being applied to Wikipedia. It says to you: you should believe in the quality of the information in Wikipedia because it’s transparent. Anyone can look at the editing history and see who has edited an entry, whether they chose to sign their name after it, and what kind of edits they made in other parts of Wikipedia. Everything is transparent and completely traceable; you can examine Wikipedia back to the first word that was written. And Wikipedia is relying on the fact that it’s completely transparent to gain authority. There is nothing opaque about it. I think that’s why Wikipedia has become so successful. It’s because they stumbled upon some of these fundamental design principles and paradigms that makes this work. They could have made the design decision where one can only examine the last 50 edits. Wikipedia could have come up with many other design choices that would not make the system completely transparent. Is it an accident that they ended up with a system that can be traced back to the first edits? I think not.
However, (and that's a big however!), some people are still having trouble with the quality of information on Wikipedia even though it’s transparent. Why? One possiblity is that they have an all-or-nothing attitude. Well, if one article could be way-off, why should I trust another article? They don't, and probably don't want to, examine the history of individual articles before deciding on their individual trustworthiness, perhaps because it's too hard and too time-consuming.
So one hypothesis is that readers don't have the right tools to easily examine and trace back the editing history. That's why the idea of the WikiDashboard might be a really powerful way for fixing these problems. Social dashboards of these kinds are visualizations or graphical depictions of editing histories that will make it much easier for people to look at the history of an article and make up their own minds about its trustworthiness. The tool will enable us to do fundamental research on testing the hypothesis that transparency is what enables trust.
One thing we have done is to actually ran some experiments to understand if people are more willing to believe in information if you make the editing histories and activities more transparent. More on that on the next post.
Monday, September 10, 2007
WikiDashboard Tool (alpha-release)
We are pleased to announce the release of our first research prototype of a social dynamic analysis tool for Wikipedia called WikiDashboard. This is a quick guide to our social dynamic analysis tool for Wikipedia
The idea is that if we provide social transparency and enable attribution of work to individual workers in Wikipedia, then this will eventually result in increased credibility and trust in the page content, and therefore higher levels of trust in Wikipedia.
You might ask "Why would increasing social transparency result in higher quality articles and increase trust?"
Indeed, the quality of the articles in Wikipedia has been debated heavily in the press [here, here, here, here, and let's not forget the Nature magazine debacle].
Wikipedia itself keeps track of these studies and openly discusses them here, which is a form of social transparency itself. However, even Wales himself
The opposite point of view, however, has not been debated or expressed nearly as much: Precisely because anyone can edit anything and that anyone can examine the edit history and see who has made them, it will (or has already) become a reliable source of information. I think Michael Scott, the character on the popular TV show "The Office", puts it succinctly: "Wikipedia is the best thing ever. Anyone in the world, can write anything they want about any subject. So you know you are getting the best possible information."
While tongue-in-cheek, it brings up a valid point. Because the information is out there for anyone to examine and to question, incorrect information can be fixed and two disputed points of view can be examined side-by-side. In fact, this is precisely the academic process for ascertaining the truth. Scholars publish papers so that theories can be put forth and debated, facts can be examined, and ideas challenged. Without publication and without social transparency of attribution of ideas and facts to individual researchers, there would be no scientific progress. Therefore, it seems somewhat ironic that the History Department at the Middlebury College have banned its students from citing Wikipedia sources .
Indeed, just very recently WikiScanner has brought the issue and idea of social transparency to the forefront. It helps people find out the organizations where anonymous edits in Wikipedia are coming from. A week or two later, WikiRage helps identify the hottest trends in Wikipedia.
From academic works, we have seen interesting work from IBM called History Flow that visualizes the edits to article pages in Wikipedia, and the UCSC Wiki Trust Coloring Demo that demonstrated how trust could be visualized line-by-line. These are all examples of how being able to better understand editing history and editing patterns at a glance could dramatically help users uncover problems and the trustworthiness of contents on Wikipedia.
These tools and other discussions [NYTimes , blogs, and slashdot discussion] are noticing that accountability and transparency appears to be at the heart of the process that helps generate quality articles.
Guide to our tool
The tool can be used just as if you're on the Wikipedia site itself. All of the functions (such as the article search function, and the edit and history tabs) work just as before. The site provides the dashboard for each page in Wikipedia, while proxying the rest of the content from Wikipedia.
Note that we only currently have edit data up until 2007/07/16, so more recent edits are not included in the charts. We're working to fix this.
See our guide for help on understanding the visualizations in the WikiDashboard.
Some Interesting Examples
We will use the 2008 presidential election as an example. In the figure below, we see that the activities on this page has been heating up lately:
2008 US Presidential election
Here are some notable Democractic Party candidates:
Here are some notable Republican candidates:
We're curious of how the Web community will use this tool to surface social dynamics and editing patterns that might otherwise be difficult to find and analyze in Wikipedia. We are also interested in applying this tool to Enterprise Wikis. Please let us know by leaving a comment on this blog post on patterns you find or questions for us. Alternatively, (if you wish to contact us in private), email us at:
wikidashboard [at] parc [dot] com
Ed H. Chi
Palo Alto Research Center
(joint work with our ex-colleagues Bryan Pendleton, Niki Kittur, now both at CMU)
Thursday, August 30, 2007
Here are the slides entitled "Conflict and coordination in Wikipedia" (work done jointly with Niki Kittur, Bongwon Suh and Bryan Pendleton.)
Thursday, August 23, 2007
At the other end, we have "collaborative intelligence", in which we see content production being produced in a kind of divide-and-conquer environments. Ross Mayfield said on his blog that the Wiki style of wisdom of the crowd was more “collaborative intelligence” than collective intelligence. For example, the group of people who are experts on World War II tanks will write that part of Wikipedia; the group of people who are experts on politics in Eastern Europe at the end of World War II will write those articles. So there is an implicit self-organization according to interest and intention. It’s not everybody voting on the same thing—it’s everybody collaborating on different areas to result in something, so that the sum of the parts is greater than the parts themselves. That seems to be at the spirit of this kind of collaborative intelligence.
I don’t really like the term “collaborative intelligence”—it sounds too buzzy—so we tend to call it “collaborative co-creation” instead. It is a very interesting production method. There is a lot of research now on, for example, the open source movement—how it’s a collaborative co-creation mechanism, how successful it is, what’s wrong with it, etc.
Wikipedia probably the most interesting collaborative co-creation system right now, and it is unique in the sense that it is all-encompassing; its net has been cast very wide and it has been able to succeed because of that. There is a little bit of a success-breeds-success phenomenon going on there with the feedback cycle.
This feedback cycle is the part we’re really interested in understanding, because coordination is at the heart of collaborative creation. We want to understand how people are coordinating with one another through either self-organizing mechanisms or through explicit organizing mechanisms; we want to understand the principles by which those things happen in these environments but not in other environments.
Thursday, August 9, 2007
In my view, what is different about this new Web 2.0 environment is that people are sharing information today in a fundamentally different way from how they are used to. One example is Wiki systems like Wikipedia, which is a fascinating collaborative editing environment for creating an encyclopedia. The collaboration that happens here is very different from passing documents back and forth using traditional email, because you have (1) automatic versioning, and (2) you can always go back and find out who contributed what (transparency). Developments like this have taken a lot of the burden off of users. The features reduces the time it takes to collaborate with each other, thus enabling users to collaborate much more effectively with other users.
We sensed that this style of enhanced collaboration began to have an impact on people’s work, so that’s why we proposed and formed a new research area here at PARC, in April 2007, to go after some of these concepts in depth. The name of the group came from a discussion I was having with Mark Stefik and others in UIR, where I started to call this new research area “Augmented Social Cognition" (around March of 2006.)
Why did I call it “Augmented Social Cognition"? For that, we should go back to the definition of "Cognition".
Many years ago, the researchers in the User Interface Research group at PARC like Stu Card, Peter Pirolli, and myself, agreed that we needed scientists from the field of cognitive science and psychology together with people who are well versed in computer science, graphics, and information visualization. We believed that the fusion of these two areas was fundamental to advances in user interfaces.
During this time, I never bothered to look up the definition of "cognition." When I finally did, I was pleasantly surprised. The definition of cognition is “the faculty of knowing; the ability to think, remember, and reason.” That’s so succinct and so simple. But it can encompass so much.
By extension, we started becoming very interested in what I was calling “social cognition.” Now, as it turns out, the phrase “social cognition” has somewhat been used in psychology in the past, but with a different meaning. In social psychologists' usage, it means the individual cognitive processes that relate to social activities. To explain it somewhat simply, basically, it’s about scheming to insert yourself in social networks or social activities of social processes. But I actually think that’s a terrible definition for the phrase.
If cognition is the ability to remember, think, and reason for an individual, then social cognition, by extension, should have the definition: the ability of a group of people, community, or culture to collectively remember, think, and reason. As an example, our ability to remember history by writing it down on paper or stone or computer and share that with other people is a form of social cognition. Wikipedia is an example of social cognition. A group of people getting together to create a written history of our knowledge on this planet.
So now the reader probably can guess what “augmented social cognition” means. It is the enhancement or the augmentation of a group of people’s ability to remember, think, and reason.
Saturday, August 4, 2007
paper on Social Information Foraging and Social Search (joint work with Peter Pirolli, Shyong (Tony) Lam.)
As a side note, we also presented
an eyetracking paper that showing the effect of highlighted text in reading tasks (joint work with Lichan Hong, Michelle Gumbrecht).
Thursday, July 12, 2007
Found an interesting paper presented at HICSS (Rodriguez, 2007) talking about a social network system for collective decision making. Basically (to avoid reading the paper), the authors developed a social network in which users could express different degrees of trust for each other. The system could be used to make a collective decision on a posed question, e.g., “What should be done in xxx situation?”
They tested three algorithms, each of which was aimed for a different dynamic:
1) Direct democracy (everyone gets a vote, if you don’t vote your vote is lost)
2) Dynamic distributed democracy (everyone gets a vote, if you don’t vote it passes to a person you trust; if they don’t vote it passes to a person they trust; onwards until it reaches someone who votes whose vote is then worth two)
3) Proxy (expert) network (everyone gets votes proportional to their in-degree trust links, otherwise same as 2)
The actual algorithm was based on particle swarms to make it more probabilistic and graded, but basically the same as described above. Turns out that in the test problems all of the three forms led to very similar answers, but that might be an issue with the problems not exposing the differences.
It is an interesting paper with some potential design patterns for collaborative intelligence.
Thursday, June 21, 2007
Today, Lada Adamic came to PARC and give a talk on the identification of expertise networks in discussion forums. Her talk provoked a lot of discussion and thoughts about future research in this area.
Her abstract and title information are below:
Expertise Networks in Online Communities: Structure and Algorithms
Web-based communities have become an important place for people to seek and share expertise. We find that networks in these communities typically differ in their topology from other online networks such as the World Wide Web. Systems targeted to augment web-based communities by automatically identifying users with expertise, for example, need to adapt to the underlying interaction dynamics. In this study, we analyze the Java Forum, a large online help-seeking community, using social network analysis methods. We test a set of network-based ranking algorithms, including PageRank and HITS, on this large size social network in order to identify users with high expertise. We then use simulations to identify a small number of simple rules governing the question-answer dynamic in the network. These simple rules not only replicate the structural characteristics and algorithm performance on the empirically observed Java Forum, but also allow us to evaluate how other algorithms may perform in communities with different characteristics. We believe this approach will be fruitful for practical algorithm design and implementation for online expertise-sharing communities.
This is joint work with Jun Zhang and Mark Ackerman at the School of Information at the University of Michigan.
In her talk, I found a quote that's worth keeping around. Referring to Yahoo! Answers, Eckart Walther said:
[it is] the next generation of search ... [it] is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of genrosity. The fundamental belief is that everyone knows something.
- Eckart Walther (Yahoo research)
Of course, this has great connection with Wikipedia and the answers it provides too, so these kinds of ideas are at the center of several research projects here at PARC, including our characterization studies of Wikipedia (see previous blog entries).
Lada's work here, in a nutshell, is using some simple methods to identify the expertise level of users in a discussion forums, by looking at the social network formed by the answer/question pairs. It turns out that simple algorithms that rely on simple measures of # of answers provided works nearly as well as sophisticated algorithms such as PageRank or HITS algorithm. She and her co-workers measured this by looking at the data in the Java Forum.
Some of the most interesting discussion revolved around the understanding of micro-economics of behavior. If it is known to users in the community that # of answers or replies will get them a high rank, they might game the system by replying with minimal irrelevant content. We have seen this kind of behavior in Wikipedia as well. If we were to align the incentives in one way, users are likely to game the system along those incentives. How do we design social systems, then, knowing the user behaviors that might follow certain micro-economic predictions?
On a side note, she recently won the vote on Wired.com for being a sexy geek!
Thursday, June 14, 2007
WSJ article on social networking sites as delivery channels
WSJ seems to be paying great attention to this new platform, perhaps because it can make a huge difference in business to have social networks and social computing built directly into the delivery channel of contents. Their recent article on how social computing is making an inroad into research universities is a good example of how the trend toward Augmented Social Cognition research appears to be unstoppable at this point.
WSJ article on social computing in research universities
The blog post above discuss how vandalism is affecting a particular tag at last.fm. It seems that basic human desires to work in anti-social ways occurs in many social Web2.0 systems. Of course, data mining experts and others have worked tirelessly to come up with algorithms that filter out these 'noise', but I can't help but wonder if these 'noises' are just as valuable as 'real data' in understanding human behavior. Moreover, these outliers seems to point to real data that we could extract and potentially use.
Tuesday, May 22, 2007
"todd450 pointed us to a nifty visualization of Wikipedia and controversial articles in it. The image started with a network of 650,000 articles color coded to indicate activity. The original image is apparently 5' square, but the sample image they have is still pretty neat."
The original blog post was here.
Saturday, May 19, 2007
Interesting enough, it was already blogged by someone at AOL Search here.
Tuesday, May 15, 2007
As we were getting ready for the alt.CHI presentation last week at the CHI conference, I realized that the way we have been looking at the frequency of user edits in Wikipedia was not really getting at the root of the issue. What we really aspire to find out is "what processes are governing the users' participation in Wikipedia?"
In the alt.CHI paper, we discovered that around 2003-2004, administrators in Wikipedia was making around 50% of edits! Definitely seemed like "power of the few" was at work in Wikipedia. Indeed, admins in Wikipedia have a great deal of power. They set policies, ban destructive users, help resolve disputes, and generally keep order within the system.
Moreover, when we analyzed the data using high-edit users (users with 10,000 edits or more), we got the same result. The algorithm was: (1) For all wikipedia edits for all times, find users with more than 10k edits; (2) compute the total number of edits in month "x"; (3) compute the total number of edits made by users identified in step 1; (4) divide result from step 3 with the number from step 2. Here is the graph:
And when we computed the diff between all 58.5 million revisions of Wikipedia, we found that the number of words changed by admins (as a proportion of total words changed by everyone) was also waxing and waning from 10% to about 50% back down to near 10%.
We discovered, as outlined in the alt.CHI paper, that users with low number of edits is becoming a bigger part of the total population. It seemed like from the above analysis, users with low number of edits were becoming more powerful in Wikipedia.
When I presented these results to the Computing Science Laboratory here at PARC late last year, David Goldberg suggested to me "why don't you do the other analysis? Compute how much work the top 1% of the user (at any given moment in time) was doing?" The difference between this analysis and the analysis we did was somewhat subtle. The analysis we did was equivalent to understand the work of the top 1% users for the entire existence of Wikipedia, instead of top 1% for that month. The algorithm here would be: (1) First, for a given month, rank all users according to the number of edits they made; (2) From the ranking of users for that month, take the top 1% of those users; (3) For that month, compute the total number of edits made; (4) For that month, compute the total number of edits made by users found in Step 2; (5) Divide result from step 3 with step 4. Here is the result from that analysis:
This clearly showed a very different picture. So what's really going on? It was this past week I realized that we could have summarized the result in a different way. We could instead plot the long tail distribution of user contributions:
In fact, plotted on a log-log plot (also known as a power law plot), here is what it looks like:
This arises partially because of the user turnover rate on Wikipedia:
So what this appears to mean is that there is a rather simple explanation for what's going on here. We have a long tail architecture of participation in Wikipedia. At any given moment in time, a few users are a lot more active than the rest of the population, but there is a long tail of other users who are contributing to the effort.
Monday, May 14, 2007
We presented two papers at the CHI2007 conference. One paper was on the conflicts and coordination costs of Wikipedia. (Paper here.)
The other paper was an alt.chi paper on the power structure of Wikipedia. (Paper here and here.)
The room was absolutely packed (easily 200+ people there), and they were spilling out into the hallways! Picture above was found on flickr.
Thursday, May 10, 2007
Our intention is to conduct research in two main different ways:
First, we are characterizing the various social web spaces, such as Wikipedia, del.icio.us, etc.
Second, we are building new social web applications based on the concepts of balancing interaction costs and participation levels. We are planning on extending information foraging theory to understand some of these economic models of behavior.
Tuesday, May 8, 2007
- Cognition: the ability to remember, think, and reason; the faculty of knowing.
- Social Cognition: the ability of a group to remember, think, and reason; the construction of knowledge structures by a group. (not quite the same as in the branch of psychology that studies the cognitive processes involved in social interaction, though included)
- Augmented Social Cognition: Supported by systems, the enhancement of the ability of a group to remember, think, and reason; the system-supported construction of knowledge structures by a group.