Knowledge Representation and
Collaborative Filtering

539 Emerging Technologies Outside the Classroom

James Decker

5.8.01

Whether it's processing power expanding at exponential rates, or connectivity measured in mushrooming numbers of new users daily, however we choose to compass the Information Age, it is clear that the tools for distributing information have outstripped notions for putting it to meaningful use. Are more information specialists what we need? Can taxonomies be deployed as ready-made structures? Does it matter for whom, for what hope, for what era we make ourselves ready? Or is technology neutral?

The recent history of Knowledge Representation in information design may prove a useful example of neutrality as a cultural bias all its own, part and parcel of a culture that values ultra-efficiency as the preferred hallmark of social progress. Knowledge Representation makes a particularly relevant subject for study because it has returned significant successes in e-business settings like Amazon.com's automated recommendations feature. Knowledge Representation also implies a larger developmental arc that may not be realized in two years or in five and will require actual changes in human behavior and cultures of learning. For this reason, some of the most powerful information design strategies have been consigned to ten-year and fifty-year business plans. In the belief that the most sorely needed strategies are those that will require involvement of individuals in large case-based studies and sustained public education efforts, this article surveys the early efforts at Collaborative Filtering.

Historical precedents of cultural adaptation to unintended consequences of dynamic, time-based media such as film may provide useful context for recognizing present challenges and opportunities. Once social adaptations to technology are described, we can more clearly understand the shortcomings of narrowly-conceived models of "efficient" information design which refuse to engage the complexities of social engagement by preferring reductive models of "usability". Finally, through a survey of software designs that have focused on collective intelligence and active collaboration, we anticipate areas of future research, development, and design methodology rooted in social structures as distinct from data structures and artifical intelligence. More emphatically stated, distributed intelligence and active collaboration constitute a substantive return to the intellectual challenges of humanism.

Shifts in social and communicative practice occur slowly. We might have hoped that the inefficiencies of human learning and emotion did not lie on the critical path of information architecture, artificial intelligence, and frictionless existence. Many embark on epic detours to avoid the labyrinths of individual differences in problem solving, learning and belief systems, perception and codes of trust and communication. Human adaptability gives rise to diverse problem-solving strategies and skill sets and characterize our species as tool using and also social organisms. Knowledge managers and information architects have become more willing to conceive of institutions as organisms than they are to locate their task in the shadowy differences cultural and individual practice. It's the adage searching for our keys under the street lamp rather than in the darkness where we lost them. Humanistic computing surely will require conceptual and cultural changes not of the management theory kind, but of broad political, economic, artistic, and ideological kind. We might regard such an unquantifiable claim in light of print technology. The advent of moveable type linked information inextricably with industrialization. That is not to suggest that knowledge and literacy became somehow instantly joined, however. Architecture, poetry, music, and voice remained far more effective media for communicating within and across cultural boundaries, slow media that today's knowledge management gurus would identify as inefficient and information poor. That art and architecture were well suited to the inefficient nature of human learning and social development in no way suggests that print technology would get off without negotiating the landscape of cultural, artistic, and political development. We look back to Gutenberg's press as technology that revolutionizing knowledge distribution. Distribution was the primary efficiency of print technology, but print became viable as a medium only when societies dedicated themselves to training a specialized class and then entire populations in the arcane skills required both to decode and produce it. Clergy, craftsmen, and housewives alike dedicated themselves to mastering, remastering, and overcoming the limitations of print technology. As we consider human ingenuity in discovering essentially non-technical strategies for overcoming technological limitations of a new medium, it is wise to consider that illuminated manuscripts, tiled mosques, and the nineteenth century novel were produced by societies far less specialized than our own and that we may contend with challenges of communication and innovation particular to our disciplinary insulation. One such challenge that might be overcome through cross-disciplinary practice is the careful scrutiny of the belief that information is a thing apart from action, that memory is storage, meaning arises from structure, and that knowledge results from access to stored and structured information. The emergence of print technology did not accelerate knowledge acquisition; it merely increased the ratio of message to audience. One consequence was that printed artifacts had to adopt uniform standards for language and information that would be set to type. Such standardization was useful in a number of respects but constrained the development of language and ideas such that ingenious compensations had to evolve to overcome the adversity that came along with this increased ratio of distribution. Centuries later, that ratio has again dramatically shifted but since the ingenious compensations that grew up around print were in no sense directed toward efficiency or necessity, we begin again the learning curve, a curve that must bring us out of disciplinary isolation. As the centuries old fit of print standardization begins to feel snug we may forget what capacities for invention, expression, and collaboration we have always possessed apart from our specific tools. We forget what was required of us to make sense of film, radio, and television. To recall life before literacy is in some sense to perceive the technological limitations that are erased by the emergence of computational media. Such recollection is not easily shared with one another because the first changes involve the loss of rhetorical norms, legal codes, labor skills, and organizational structure. Already networked computational media achieve wide-area distribution exceeding that of print. Such a statement is only true if narrowly construed in the interest of business and academia. The digital divide is one of those extra-disciplinary questions little discussed and never conceived of as central to the progress of information architecture. But if life before literacy affords us a sense of the human potential to adapt, invent, and grow up around new tools, then it precisely those nations now restricted by technologies of print and broadcast media who may leapfrog the established nations and cultures that depend on installed infrastructure and its culture of permanence, origin, authenticity, property, and privacy. These five tenets will not disappear, but will to some extent become displaced by the effects of two-way, participatory, decentralized media and computational technologies. This is not to speak some inevitability somehow inherent to digital tools themselves anymore than it is sensible to declare computers superior to books, radio, television, marble, or paint. Permanence, privacy, authenticity, and property will not vanish but their meaning and their function may be transported as relevance, anonymity, validity, and reputation, respectively. Permanence and relevance? Privacy and anonymity? Property and reputation? These are not such different notions but axes along which we might usefully regard that role of knowledge, information, and collaboration in social contracts both public and private, future and past.

Clashes between American popular culture and the broadcast entertainment industry provide an interesting example in the present day of how emergent technologies have turned a well-established social contract quite deftly on its head. Napster is less significant perhaps than the skirmishes regarding intellectual property rights. As has been the case for centuries, individuals within their local communities are adopting and adapting stories, songs, and fashions from the most widely-known sources. Today, fans are enjoying distribution nearly on par with major institutions. The fans are using digital tools to express interest in and identification with everything from brand names to fictional characters by elaborating, combining, modifying, and satirizing cultural artifacts.  In response, corporations are testing their legal rights to constrain fan use of intellectual property. However, since satire is protected speech whereas praise and imitation are not, cease-and-desist letters are backfiring on the corporations who employ them by converting loyal fans into legally protected and vocal critics.  In this way the relationship of property to propriety or goodwill remakes the entertainment-industry's drawing board while individuals rediscover their access to one another, connecting with messages that have not been censored or subjected to marketing focus-groups.  Similarly, the permanence of print as the journal of historical record is being challenged by online publications that offer more current, less politically constrained and often more relevant messages.  An Indian politician accepts a bribe from journalists who quickly distribute the video online. The forgotten phenomena of social buzz carries the message without benefit of established distribution or advertising. In the Philippines, President Joseph Estrada is deposed as a result of cell phone text messages calling citizens into the streets while the giant newspaper presses sit idle. Authority and permanence are shifting in importance, evidence of the fact that the notion of information itself has been predicated on scarcity, whatever was not commonly known was informative and relevant. Individuals are now able, if as yet unskilled at probing for relevant information. Individuals are acquiring new sensitivity for what makes information trustworthy and relevant.  These are standards personally derived rather than received as judgments of a producer.  But as access to information-rich resources grows, the complexity of these resources can leave the individual chicken-brained, scratching and pecking at menus and lists. I contend that the skills to structure and sort will not be separate from the skills of rhetoric, composition, and performance that slowly over centuries. Information architecture and interface design have relied too narrowly on what human communication, expression, and the role of genres must means to the development of computational media. The print based menus and dynamic lists generated today are assembled according to the attributes of the data itself, a sort of self-structuring object as a short-circuit solution that essentially recreates conventions of print where standardization, efficiency, and quantity are familiar defining boundaries. These are not the most relevant bounds by which to locate information.  What might substitute? Perhaps these boundaries were contingencies of moveable type technology itself. Perhaps these boundaries now tangled in an overgrowth of cultural practice. Perhaps our forms and conventions have been too few and it is conceptual boundaries themselves that are in scarcity. In the mansion of memory and reason perhaps we are living in single-family track homes. What faculties for exploring, envisioning, and communicating about the world do humans possess outside of traditional literacy skills?

Understandably, efforts to remedy data glut tend to approach the task as one of information organization.  More interesting perhaps, are efforts to employ knowledge representation. Knowledge representation seeks to model the purposes, preferences, intentions, and even beliefs of individual users. Knowledge representation realized as computational algorithms could provide a basis for dynamic boundaries for structuring and displaying data.  Efforts to re-examine teaching and learning as human faculties and as fundamentally social activities have recently emerged in response to the challenges of Knowledge Representation.  Such approaches bring a much-needed dynamic to machine-centered definitions of information. It is anything but a simple proposition however. Not surprisingly, the beliefs, intentions, and expectations of individuals resist representation. While information can be cropped, spliced, and structured, knowledge resists such quantification. At the risk of disappearing into romanticized humanism, we might take heed of interdisciplinary approaches that attend to emergent forms in print, radio, film, and television for representing subjectivity. Representation and subject-object ontologies are anything but simple fare but to steer clear of the notion is to willfully avoid the scope of our enquiry. We must remain sensible to the role of technologies and social contracts in human development, and not shy away from works in literary theory, philosophy of film, theater, and plastic arts.  How we work, imagine, remember, and communicate will change as surely and as radically with computational technology as it has with the innovation of single point perspective in painting, the advent of photography, the rise of telephony, or the four-color T-shirt.   With this indication away my central focus, I return to a perspective learning and teaching as an open and interdisciplinary basis for surveying several promising strategies and emerging technologies for Active Collaborative Filtering, Peer-to-Peer models of computing, and Knowledge Representation.

It is understandable that initial responses to information overload should focus on the design and function of databases themselves. A great range of strategies has emerged for indexing data and equipping indexes with filters that deliver results relevant to the intentions of individual users. Standard Web search engines deploy automated software agents to gather information beyond simple plain text indexing. The number of third-party sites linking to a page is considered a prime indicator of the popularity or general relevance of that page. The problem of natural language input and sensitivity to syntax, homonyms, and synonyms has prompted a creative computing solution from the University of Colorado at Boulder. Latent Semantic Analysis (LSA) applies statistical analysis to large bodies of text to render multidimensional"semantic spaces". This statistical space is generated without reference to meta-textual rules, instead relying upon numerous weak constraints present within and intrinsic to the entire set of data. These "constraints" are derived from nothing more than the co-occurrence of words. A user can then input a natural language search term that will be compared and situated relative to the existing semantic map and its similarity to other terms measured. That LSA works at all is frankly amazing, given that it detects semantic similarity without reference to spelling, grammar, or dictionary comparisons. However, it functions well only with declarative knowledge and beyond that the conditions to which LSA is error prone are not well understood even by its creators.  While the technology has been patched with other tools to create some commercially viable prototypes, unfortunately one of those uses is the grading of student short essays according to their ability to "cover" or summarize course materials. 

The challenge of database design and filtering, like the task of interpreting natural language queries quickly becomes the fine art of guessing.  Advances in natural language processing, and decision-making systems will undoubtedly improve with time, but in the mean time the challenge of filtering information threatens to reductively limit the considered intentions of users according to buying decisions or reifying terms like "interestingness" according to what computational models are able to process.  Software agents that monitor user decisions and patterns of action do not escape this risk, but take an important step in the right direction. Perhaps the best known example of monitoring agents which predict individual interests is Amazon.com's automated recommendation system. (See Appendix A).  Shoppers who like the book you just looked at, also like these five books.  The affinity group to which you are temporarily assigned is logically intriguing in part because of its anonymity. You cannot judge or ally yourself with them since they exist only as a hypothetical version of your interest. After all, how much do those shoppers know about the domain of interest?  What unsavory purposes might they have for reading books that I read?  It is a mild suggestion at best except where the recommended book titles themselves do the work of drawing my interest.  The dimensions along which we understand relevance can clearly be expanded to draw inferences more useful than national buying patterns. Amazon has coupled these soft suggestions with the more forceful technique of locating written reviews about a specific work alongside its book jacket.   These are nonprofessional reviewers whose reliability is established by ratings that indicate their popularity among other users.  Peer ratings raise or lower the visibility of individual reviews with respect to a specific title and with respect to the community as a whole.  Automated agents step in the right direction by attending to knowledge as it is situated in social activities.  To move beyond the social activity of peering sideways in the supermarket aisles, however, subjective human judgment must be solicited.  This reveals a current limitation on fully automated bots that seek to collect data about the wants and needs of individual users, and it reveals something about the human process of decision making: namely that we frequently revise our intentions according to a broad range of sometimes very weak factors.  Our thoughts and preferences are continually shifting.  One promise for future computing is that software agents will be able to represent diverse interests and interact with one another to model the complexity of human decision processes.  To represent any but the simplest decision making process is likely to generate systemic complexity that obscures our ability to design practical applications.  For this reason knowledge representation depends more on improving our understanding of human discourse, collaboration, and cognitive science.  When quantum leaps in Artificial Intelligence are realized, such an improved understanding will serve as   basis for cultivating a greater fluency and aptitude for dynamic representations, symbol systems, taxonomies, and skills for mastering complexity and systems thinking.  As evident with Latent Semantic Analysis, researchers are faced with the challenge of determining how a neural network may or may not be sensitive to initial conditions [1] .   The tools, procedures, and vocabulary for conducting longitudinal studies of case-based studies of complex systems are notably lacking.  If the potential of artificial intelligence is to be realized, it is likely to involve conceptual and behavioral changes in how people gather, process, and act upon information such that knowledge can be situated with respect to more boundaries than its presence, file size, and time stamp in the database.  This behavioral change will certainly require more than the Amazon model of humans voting on which data they like the most.

In The Social Life of Information, Brown and Duguid find it curious that "reengineering tends to focus most heavily on the input and output of the stages in a process. It is relatively indifferent to the internal workings of these stages--to practices that make up a process and the meaning they have for those involved" (p. 95).  Rather than defining those internal workings narrowly by scripting the repeated actions of users, Brown and Duguid suggest that intentions and beliefs are themselves the activity of practice. They go on to assert that beliefs and intentions are socially constructed and even resistant to logic.  Artificial intelligence and knowledge representation aside, social situations where meaning, interpretation, and understanding are paramount rely very little on linear process and prove resistant to strategies of process reengineering.  "Communities of Practice" has become a commonplace term that underscores a critical difference between vertical or hierarchical organization of information vs. knowledge that emerges from "lateral ties among people doing similar tasks."  These lateral ties often support knowledge that is tacit.   From this perspective the primary focus of information systems should have less to do with automated and predictive systems and should focus more on supporting dynamic cognition that is always situated in social contexts.

After all, "the same stream of information directed at different people doesn't produce the same knowledge in each. […] practice shapes assimilation." (Brown and Duguid, p. 129)

Studies of computer supported collaborative learning in Science and Math education provide us with measurable examples of how practice shapes assimilation. Guzdial and Turns have observed separate but related practices among students working together to formulate a question, focus its terms, and recount experiential evidence that seems to be relevant to the focused question.  In another study, Jacobson and Archodidou studied science students working with multiple representations chemical reactions reaching relative states of equilibrium. They observed that students working alone did not synthesize information from multiple representations of the same reaction.  Instead, they fell back on least common denominators between the representations as a strategy to preserve their misconceptions.  In a group working together with the multiple representations however, a dialectic process emerged by which one student articulated an explanation, another elaborated upon it.  When a third posed a logical conflict by drawing attention to parts of one representation the first student synthesized a new and accurate explanation to which all agreed.  Apart from learning that audio narration needed to be removed from the online learning materials because it impeded collaborative learning, a specific structure of group interactions surrounding the study of chemical equilibrium was observed. These observations provided the basis for restructuring the learning environment to support a practice the students were capable of, but had not been asked to do previously. 

Looking for intelligence in individuals is a little like hunting for nature at the zoo, according to Roy Pea who is well known for his theories of distributed intelligence.  Pea emphasizes that intelligence cannot be removed from context and argues that intelligence even resides in inanimate objects since the physical characteristics of our tools facilitate and therefore "select" certain ends. Efficiency may be gained but only by adopting alternate ends which runs the risk of stagnating innovation and collective participation in actively developing knowledge. As with our example of Amazon's ratings and recommendations, the effectiveness of Collaborative Filtering of information is limited by the extent to which it can represent and foster relationships between recommender and recipient.  The notion of distributed intelligence has been succeeded by more detailed research on situated cognition.  From an educator's perspective on domain specificity in cognition and culture, Lauren Resnick identifies among cognitive researchers a greater interest in "mapping details of how people coordinate cognitive activity in particular social and tool situations [than in] accounting for personal structures of knowledge."  Instead, we might think of ways that individuals "tune adaptively to the kinds of natural situations they encounter."  Examples of recent efforts to situate knowledge through Active Collaborative Filtering including the Knowledge Pump developed at the Xerox Research Centre Europe, Grenoble Laboratory.  The Pump presents a small palette of functions, which run alongside, but independent of a Web browser.  The goal of the Knowledge Pump is not to serve a wide area of users but to support small groups of collaborators who are likely by virtue of common practice to make use of similar information. When bookmarking sites, a user is prompted to rate the site and to classify its relevance to communities, which are named in list form.  By identifying himself or herself as a participant in a particular community, the user can select other individuals from a community as trusted "advisors".  The user also tries to characterize his or her interests by selecting areas of interest that have been organized by a third party.  Subsequent searches for information will return results that are sorted according to the recommendations of the community with particular weight given to those individuals identified as advisors.  The total activity of each community, referred to as a Knowledge Pump, is represented with pump gauges that show how many recommendations are being added to the system and how many are being drawn from the system.  This makes the vitality or stagnation of the group transparent.  Active pumps may draw more users, inactive pumps may dry up entirely, or users may learn to interpret gauge readings as favorable or unfavorable to certain kinds of searches. The clear disadvantage to this strategy is that much of the user's effort, and much of the interface itself are given up to the filter itself.  Lists of communities, categories of interest, and ratings systems are all extraneous to the subject matter and practice around which the community has formed.  The advantages of the knowledge are that it invites users to identify individuals whose reputation in the community validates their recommendations.  These are rather coarse grain affinities however where an individual's recommendation might be valid for some areas and not others within a category of interest.  When representing preferences and affinities become the obligation of the user there is also a risk of insulation or stagnation.  If I fail to revise my preferences or notice the value of a newly added or newly inspired advisor, I stagnate in a community of insulated like-interests. 

The notion that individuals are producers as well as filters of information is also lacking from the Knowledge Pump.  If a member of my Knowledge Pump community developed new attachments or affinities, a more specialized representation would be necessary to accurately model the knowledge community. The Referral Web was developed at AT&T Labs to represent social relationships among individual producers using documents published online as evidence of social relationships.  To explore whether your friends might put you in touch with their friends, you would locate your friend in the Referral Web and designate them as an anchor.  The Referral Web would then sprout branches indicated other individuals who appear to have collaborated with the anchor.  You might set another anchor, someone you'd like to get to and focus on branches sprouting from the second anchor as a means to illuminate a social chain by which you might gain access from your first anchor.  When lines extend to common collaborators from both trees, you are, as they say in business.  The Referral Web is prone to error as it extrapolates from published documents connections that may or may not be valid.  But this kind of information is not inexplicably rare given its clear value beyond address books and whether or not the community is comprised of published experts.  The Referral Web could serve as a basis for pursuing knowledge as situated among people, for questioning disconnections between individuals, and considering fruitful connections that you might facilitate between people.  The Referral Web is striking because it does not filtering out irrelevant information but exploits the existing structure of databases and applies a new skin to it. 

Another interesting form of Active Collaborative Filtering suggests how users might be understood as producers rather than as consumers of knowledge by rewarding their contributions inasmuch as they are relevant to others.  Unlike the Knowledge Pump or the Referral Web, reward mechanisms for knowledge distribution function well with large communities.  Slash Dot is one example of a community of interest where knowledge distribution is rewarded.  Readers and contributors of SlashDot are one and the same. (See Appendix B).  By locating, reviewing, and posting information to the SlashDot community, contributors can be awarded points by those contributors who have previously achieved "moderator" status by virtue of having made valuable contributions.  Presumably, the creators were the first moderators who then ceded control of their creation to the community.  Moderators are themselves a collective and a cap is placed on any one moderator to encourage them to spend their points by "moderating up" or "moderating down" the contributions of others.  A total cap on the number of points effectively prevents power mongering.  The SlashDot community is notoriously nerdy and intrinsically motivated, but the structure of a rewards based knowledge community has tremendous potential for attracting motivated individuals who grow into a wide range of functions. Such a system could conceivably embrace the idea of compensating contributors.  Automated text analysis could direct incoming messages to qualified raters who not only rate contributions indicate general reasons for their preferences.  Another moderator might create summaries of contributions and the threaded discussions they generate.  Another might review links created by contributors and discussion respondents, or create new links to attribute quotations or refer newer readers to archived discussions.  An analysis of these possible functions are discussed in greater detail in Caron, 1997.  What is striking about wide area collaborative systems is the centrality of individual reputations as both a motivating and structural factor of the knowledge base.  Reputations and prestige are live wires in human social interactions and can in fact potentially be dangerous as any number of 19th century novels demonstrate.  The SlashDot community allows for anonymous participation and the use of pseudonyms is an established norm in the community.  This may serve both as an insulator from politics in related or more general communities but it also appears to amplify the importance of reputation as earned solely on the value of one's contributions.  It is also noteworthy that the community does not insulate itself against abuse or misinformation, relying on its internal mechanisms to reward valuable contributors and devalue disreputable contributors.  By remaining open to abuse, the system maintains need for active participation rather than process for automation.  Chislenko (1997) describes advertising-based knowledge communities today as lacking the "intelligent message targeting" that Active Collaborative Filtering represents. As a result, it is the loudest signals that are heard.  "The social environment becomes filled with shocking people and shocking issues, which do not necessarily have anything to do with real values or problems."  Perception manipulators themselves come to dominate political and economic decision-making.  Despite the potential effectiveness of Active Collaborative Filtering in large communities, Chislenko recognizes that it will not solve all problems. 


Problem solving, however, provides an interesting example of collaboration among participants who may have well-formed concepts on a topic that nonetheless proves resistant to group consensus and resolution. Where "perception manipulators" dominate decision-making processes, tools like QuestMap perform a radical function by making the processes of decision making transparent to the community. Where the community is prepared to work in earnest towards resolution, QuestMap is intended to enhance face to face collaborations as a persistent and amendable record of the group's collective thought processes. QuestMap establishes a three part temporal structure where questions, ideas, and arguments are set forth linked by labels that identify them as challenges, elaborations, references to prior decisions etc. QuestMap does resort to an overarching structure where "divergence" or brainstorming leads to "convergence" or clarification and culminates in a "decision".  That decisions are not actually made in this fashion is perhaps obvious, but insofar as voting allows for irrational and gut level feelings and presuming that QuestMap's three part structure has driven participants to communicate offline it is an interesting example of a tool for consensus-resistant problems. For large groups, MIT's Open Meeting is somewhat less structured than QuestMap.  Open Meeting requires participants to identify their contributions as "Agreement, Disagreement, Question, Answer, proposed Alternative, Qualification ("yes, but''), or (report a) Promising Practice." By cueing the discussants to utilize an argument structure Open Meeting does not drive towards resolution but tries to localize participants in particular areas of the discussion. Locality in this system refers to specialization of "interest, role or function" rather than geographic locality and grouping discussants in this way is primary design objective of Open Meeting. As a general-purpose design, the idea of tagging contributions of participants would be suited to more flexible frameworks and customizable interfaces where tags function to attach message to multiple displays while other tags remain hidden. User customizable tags could allow for content-area specificity and creative customization.  In this way the logical-argument tools serve as suggestions or scaffolding, which individual users could dismiss or creatively enhance.  Without this kind of flexibility, the ideal of focusing on internal processes and beliefs is again subsumed to process-centered ideal of imposed structures that too easily stifle serendipitous connections and conflicts among discussants.

If liberation from the constraints of distribution increasingly subjects knowledge to redefinition, the tools examined thus far push against limits that in some sense reveal artificial limits to that redefinition.  Definitions of people themselves as either producers or readers of information, but not both, is clearly a limitation to automated collaborative filtering, but also to collaborative tools of all kinds.  Tools such as the ReferralWeb show how databases may be conceived as artifacts in and of themselves where exchanges between people reveal patterns within practice.  This raises questions about how cross-purposes and multiple frameworks for information can constitute knowledge.  Do experts operating outside their content specialties cease to be knowledgeable?  Do the perspectives and the meanings constructed by non-experts, children, for example, fail to qualify as information unless measured or somehow published in a predictable form?  These questions are not intended as intellectual exercise in erasure, but if we are reconsidering boundaries by which knowledge is situated, some degree of erasure is warranted.  Suffice to say that the division between information producers and information receivers is shifting if not vanishing entirely.  It is a deep chasm in current social practice and above it print culture and broadcast media have built their bridges.  As fans take popular culture into their own hands, authors, artists, and composers may start to see the services of middlemen in a new light.  As media mergers shrink the number of studios and independent news desks, something akin to the railroads scoffing at the automobile industry may be in our post-industrial future. Peer to peer technologies promise an avenue by which localized communities or subnets may form along lines of relevance, anonymity, validity, and reputation. The absence of centralized servers is often cited as the defining feature of peer-to-peer technology. However, in "Remaking the Peer-to-peer meme," Tim O'Reilly names the more essential attribute of peer-to-peer as two-way communication and the sharing of resources.  Peer-to-peer technologies offer models for collaboration that do not measure existing practice so much as invite innovative uses where common interest, common purpose, and mutual affinity serve as opportunities for crossing those vanished gaps which lately confound us. 


 

Yenta is an example of peer-to-peer technology that seeks to establish communities of interest.  Real names may be withheld and you can direct Yenta to scan those folders and files that will best represent your interests.  As you find and store new items, Yenta notes the growth or the change in direction of your interests continuously.  This information is used to scan the network for individuals with related interests. If Yenta suspects that you may have strong related interests with another individual, it makes introductions.  A personal profile can be presented wherein other individuals have signed to vouch for your good… pseudonym.  While Yenta is still predicated on the "great minds think alike" fallacy, it promises a much finer grain of knowledge representation for purposes of collaboration.  To really infect another group, without formally inviting them to a dance, file transport histories and semantic frameworks or categories would need to be embedded within files such that uses, domains, and environments where information has been situated are maintained as knowledge traces.  The purposes of tracking user histories and knowledge frameworks within files may not immediately seem to serve clear ends.  To be sure, such an undertaking requires broad adherence to standards for embedding and supporting metadata. Fortunately, such a standard is already available.

Metadata is the stuff that was added to text files in order to make them Web pages.  In the great browser wars at the close of the 20th century, the imaginative uses for metadata were severely curtailed and applied almost entirely to surface features of Web pages.  But, metadata is capable of telling much more about a particular file than what font its author preferred.  And metadata exists in all file types, not only HTML.  But the catch-all HTML tag  <meta> is widely known and serves as a representative case of the disregard that has been paid thus far to informational contexts.  The catch-all <meta> tag is so poorly used that spider bots are programmed to routinely disregard the indexing information that is supposed to be contained within them and rightly so as marketers fill these tags with pathological babble in hopes of rising above the din of the "information age".   Meta data cannot substitute for meaning, however, or stand in for the notion of "situated knowledge."  Today, I saw a woman wearing a T-shirt that read <body>.  That T-shirts "read" is itself a received cultural form which relies upon the shared awareness of social spaces as the acknowledge basis for how meaning can be made.  Those spaces and their possible meanings are sensitive to a variety of dimensions, as irreducibly various as the T-shirt and its contents ironically suggest.  Signs and systems of signs are frequently conceived of as spaces. As much as this allows us to refer metaphorically to dimensions, perspectives, and motion through signs, these spaces are more fundamentally social and ideological systems where cultural and economic friction do not magically disappear.  Steven Johnson credits Graphic User Interfaces with reifying our view of ideological environments as naturally occurring spaces. "Not since the Renaissance artisans hit upon the mathematics of painted perspective has technology so dramatically transformed the spatial imagination."  But, single-screen illusionist interfaces have begun to give way to a range of conspicuous mobile devices.  Minicomputers tucked away in designer purses, haptic devices that bring people with disabilities online, global satellite positioning systems, temperature, motion, and visual recognition sensors that make school kids viable data gathers for scientific research, digital video cameras that bring indigenous ethnographies and documentary evidence of political scandals promise to bring more of real space and real bodies into distributed cultural spaces.

Our inattention to context in the early days of the Web has predictably resulted in a premature informational jumble.  Peer-to-peer technologies promise to turn that jumble into a fiery mass. Web browsers at least unified disparate file types into a single interface [2] .  But as we arm ourselves with multiple mobile devices, each device will do its best to bind us in servitude to its proprietary information protocol.  Dornfest and Brickley alert us to the fact that peer-to-peer communities currently rely on implicit conventions but that they "have the opportunity, before heterogeneity and ubiquity muddy the waters, to describe and codify their semantics."   The Dublin Core Metadata Initiative offers a baseline from which metadata conventions could be infinitely expanded, and XML makes such a baseline endlessly expandable with its convention of Namespaces.  A Namespace tag points to a URL where conventions are maintained for the metadata tags used within that file.   A book publisher might include ISBN numbers as metadata.  A video file might include transcription data indexed to a timestamp.  But it is also conceivable that a particular copy of the video used by an eighth grade student could include metadata about the content and how it fits within an historical timeline.  This file now has value added for other students at any level and a generic metadata reader might direct those users to locate the timeline application used by the 8th grader and consider it for their own purposes.  This further suggests that how we read, combine, modify, and exchange data are measurable practices.  Since practice shapes assimilation, knowledge representation could be founded upon actions undertaken with digital tools in networked collaborative environments.  It would be a much simpler matter to concatenate metadata that had been inserted by an application, whether an email tool, a geographic map, or a contacts list as opposed to inferring context from the free form data itself.  Adherence to the metadata standards Dornfest and Brickley urge must begin in the community of programmers and interface designers with the goal of representing knowledge and social relationships among an increasingly competent and increasingly diverse body of people.  Competence with such tools will require a training, marketing, and education campaign.

If this notion belongs to a fifty-year rather than a five-year plan, the current condition of our "information age" can only lead to hope it can be brought closer to the five-year plan.  But disposable data is a familiar concept for us, just one more non-renewable resource. It is publicly cheered for being frictionless, superior for leaving no traces when it is reproduced, it is after all just information.  What nonsense.  Data has the potential to appear in multiple contexts, it is compatible with multiple modes of representation, and can bear traces of diverse viewers, purposes, and semantic connections all of which help others locate and also to make use of knowledge in ways they both expected and did not expect.  Actions mature into skills through iteration with reflection, feedback, and revision.  Similarly, iteration is important to knowledge synthesis and the permanence of print is particularly supportive of concentrated study.  Much individual study and rehearsal is the practice of repetition though, as we have noted, refinement and correction tends to come from an interlocutor, collaborator, or apple applied suddenly to the parietal cranial region.  The necessity of collaborative practice, multiple representations, and lateral constraints as the basis for knowledge representation should inform the planning, design and implementation of metadata and metadata readers.  If metadata being automatically embedded by the actions of any number of users seems too large a task, too unreliably performed, it is useful to consider your current "information providers" and understand that they would not vanish, neither would their data be jumbled together with information sent to you by amateurs or friends as it is today.  To pay attention to our information relationships with commercial service providers contrasted with local school, planetarium, chamber of commerce, Sierra club chapter, and so on, seems more conservative and sensible than risky or futuristic.  In its essence, the idea faces few technical obstacles.  Take for example an image.  It may bear a unique ID that corresponds to its presence in a variety of Namespaces created and maintained by communities of interest both professional and amateur. In a particular Namespace Metadata corresponding to that image may specify its position in a timeline, within a survey of artistic styles, on a geographic map, in a conceptual diagram, or in a list of files related by subject matter.  It is almost as if the image were a single record in a database, the boundaries of which were permeable. And of course, it is.  The number of skins or surface representations that can be laid on top of records gathered from that database is unlimited.  The people who will be interested, sensitive, or surprised to encounter a particular skin will form a community. 

 

References

Bowers, C. (2000) Let Them Eat Data. Athens: University of Georgia Press

Brown, J., Duguid () The Social Life of Information. Boston : Harvard Business School Press

Bruner, J. (1986) Actual Minds, Possible Worlds. Boston: Harvard University Press

Caron, J. (1997) "Wide Area Collaboration: a Proposed Application" http://acd.ucar.edu/~caron/wa_collab.html, accessed 4/27/01

Chislenko, A. (1997) "Automated Collaborative Filtering" www.lucifer.com/~sasha/ version 0.72, accessed 4/27/01

Dornfest, R., Brickley, D. (2001) "Metadata" in Peer to Peer: Harnessing the Power of Disruptive Technologies. Oram, A. (Ed.) O'Reilly Press: Sebastopool, CA

Glance, N., Arregui, D., Dardenne, M. (1997) "Knowledge Pump: Community-Centered Collaborative Filtering"

Guzdial, M., Turns, T. (2000) "Computer-supported collaborative learning in engineering: the challenge of scaling-up assessment" in Innovations in science and mathematics education Jacobson, M., Kozma, R. (Eds.) Lawrence Erlbaum Associates, Mahwah, New Jersey

Jacobson, M., Archodidou, A. (2000) "The Knowledge Mediator Framework: Toward the Design of Hypermedia Tools for Learning" in Innovations in science and mathematics education Jacobson, M., Kozma, R. (Eds.) Lawrence Erlbaum Associates, Mahwah, New Jersey

O'Reilly, T. (2001) "Remaking the Peer-to-Peer Meme" in Peer to Peer: Harnessing the Power of Disruptive Technologies Oram, A. (Ed.) O'Reilly Press: Sebastopool, CA

Jenkins, H. (2001) "Digital Land Grab" MIT Technology Review March/April 2000. http://www.technologyreview.com/magazine/mar00/viewpoint.asp, accessed 4/20/01

Johnson, S (1997) Interface Culture Harper Collins: San Francisco

Kautz, H., Selman, B., Shah, M. (1997) "The Hidden Web" in AI Magazine(Summer 1997)

Manovich, L. interview with Razumova, I. in Switch (5)3 http://switch.sjsu.edu/web/v5n3/J-1.html, accessed 4/12/01

Resnick, L (1994) "Situated rationalism: Biological and social preparation for learning" in Mapping the Mind: Domain Specificity in Cognition and Culture Hirschfeld, L., Gelman, S. (Eds.) Cambridge University Press

Slayton, J. "Social Software" in Switch 6(2) http://switch.sjsu.edu/v6n2/articles/slayton.html, accessed 4/12/01



[1] The problems of transparency in bot interactions is discussed in Brown and Duguid's "The Social Life of Information" p. 61

[2] Dornfest and Brickley "Metadata" in Peer-to-Peer