Part I: The concept of information is central to science, technology, and other human endeavors. Shannon's theory of information, although eminently successful for the development of modern computer and telecommunication technologies, does not capture subjective and semantic aspects of information that are not related to its transmission but rather to the expectations of the observer. Here we propose a subjective definition of information we call surprise to quantify how data affects an observer by measuring the difference between the prior and posterior distributions of the observer. Surprise requires averaging over the space of models in contrast with Shannon entropy which requires averaging over the space of data. Surprise can be estimated efficiently and, during learning processes, decreases with the number of examples as 1/N. Scoring items by surprise provides a general principle for the detection of unusual events and the construction of saliency maps that can guide the deployment of attention and other rapid filtering mechanisms in natural or synthetic information processing systems. Applications to the analysis of video data and human eye movements will be presented.
Part II: Large repositories of source code available over the Internet, or within large organizations, create new challenges and opportunities for data mining and statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated crawling, parsing, fingerprinting, and database storage of open source software on an Internet-scale. In one experiment, we gather 4,632 Java projects from SourceForge and Apache totaling over 38 million lines of code from 9,250 developers. Simple statistical analyses of the data first reveal robust power-law behavior for package, method call, and lexical containment distributions. We then develop and apply unsupervised, probabilistic, topic and author-topic models to automatically discover the topics embedded in the code and extract topic-word, document-topic, and author-topic distributions. In addition to serving as a convenient summary for program function and developer activities, these and other related distributions provide a statistical and information-theoretic basis for quantifying and analyzing source file similarity, developer similarity and competence, topic scattering, and document tangling, with direct applications to software engineering an software development staffing. Finally, by combining software textual content with structural information captured by our CodeRank approach, we are able to significantly improve software retrieval performance, increasing the AUC (Area Under the Curve ) retrieval metric to 0.92- roughly 10-30% better than previous approaches based on text alone.
Short Bio: Pierre Baldi is Chancellor's Professor in the School of Information and Computer Sciences and the Department of Biological Chemistry at the University of California, Irvine and the Director of the UCI Institute for Genomics and Bioinformatics. Born and raised in Europe, he received his PhD from the California Institute of Technology in 1986. From 1986 to 1988 he was a postdoctoral fellow at the University of California, San Diego. From 1988 to 1995 he held faculty and member of the technical staff positions at the California Institute of Technology and at the Jet Propulsion Laboratory. He was CEO of a startup company from 1995 to 1999 and joined UCI in 1999. He is the recipient of a 1993 Lew Allen Award, a 1999 Laurel Wilkening Faculty Innovation Award, a 2006 Microsoft Research Award, and was elected AAAI Fellow in 2007.
Dr. Baldi has published over 200 peer-reviewed research articles and four
books:
Modeling the Internet and the We--Probabilistic Methods and Algorithms,
Wiley, (2003);
DNA Microarrays and Gene Regulation--From Experiments to Data Analysis and
Modeling, Cambridge University Press, (2002);
The Shattered Self--The End of Evolution, MIT Press, (2001);
Bioinformatics: the Machine Learning Approach, MIT Press, Second Edition
(2001).
His research focuses in various areas at the intersection of the computational and life sciences, in particular the application of AI/statistical/machine learning methods to problems in bio- and chemical informatics.
Abstract: Energy assessment of MAC protocols for wireless sensor networks is generally based on the times of transmit, receive and sleep modes. The switching energy between two consecutive states is generally considered negligible with respect to them. Although such an assumption is valid for traditional wireless ad hoc networks, is this assumption valid also for low duty cycle wireless sensor networks? The primary objective of this experimentation is to shed some light on relationships between node switching energy and node duty cycle over the total energy consumption. In order to achieve the target, initially, we revisit the energy spent in each state and transitions of three widespread hardware platforms for wireless sensor networks by direct measurements on the EYES node. Successively, we apply the values obtained to the SMAC protocol by using the OmNet++ simulator. The main reason for using SMAC is that it is the protocol normally used as a benchmark against other architectures proposed.
Abstract: This paper details a framework for mixed reality agents, i.e. agents that exist in both the real and virtual space. These agents combine the physical presence of a robot with the adaptability and expressivity of a virtual character. The objective is to blur the traditional boundaries between the real and the virtual and provide a standardised methodology for intelligent agent control specifically designed for social interaction. We show how this architecture can be employed in the context of a mobile collaborative mixed reality environment that is cohabited by both robots and humans. As an example application we describe how the framework can be applied to a museum guide that takes advantage of the physical and virtual presence of the mixed reality agent to convey an individual and personalised learning experience. A mobile robot with associated virtual persona is the gateway to this mixed reality experience. The physical robot navigates the museum, while its virtual persona, which is unique and can be personalised for each observer, explains the exhibits and adapts its appearance to match the current context.
Abstract: One commonly employed method to calculate whether a wireless sensor network can adequately sense the entirety of a region of interest is to define the area that a sensor can monitor and ensure that the union of all these areas leaves no part of the region uncovered. This talk shows the results of a series of experiments in simulation designed to show how various degrees of deviation from intended node placement locations in a wireless sensor network affect the achievable coverage using this model. The main result is that irregular deployments are only slightly less efficient than highly regular ones but that since they have less redundancy they fail before regular cases.
Abstract:This paper details a framework for explicit deliberative control of socially and physically situated agents in virtual, real and mixed reality environments. The objective is to blur the traditional boundaries between the real and the virtual and provide a standardized methodology for intelligent agent control specifically designed for social interaction. The architecture presented in this paper embraces the fusion between deliberative social reasoning mechanisms and explicit tangible behavioural mechanisms for human-agent social interaction.
Abstract: The presentation explores the challenge of delivering Behavioural Realism to embedded avatars.An agent based approach is adopted and demonstrated within a Mixed Reality (MR) environment. The realism of an avatar is driven by the state of the intentional agent that underpins its behaviour. The traditional disconnect often found with avatars that exhibit shallow levels of behavioural realism is no longer evident.
Abstract:The taking of initiative has significance in spoken language dialogue systems and in human-computer interaction. A system that takes no initiative may fail to seize opportunities that are important, but a system that always takes the initiative may not allow the user to take the actions he favours. We have implemented a mixed-initiative planning system that adapts its strategy to a nested belief model. In simulation, the planner’s performance was compared to two fixed strategies of always taking the initiative and always declining it, and it performed significantly better than both did.
© 2006 PRISM Laboratory