United States of America Us



In the past twenty years, computers and networks have gained a prominent role in supporting human communications. This book presents recent research in multimodal information processing, which demonstrates that computers can achieve more than what telephone calls or videoconferencing can do. The book offers a snapshot of current capabilities for the analysis of human communications in several modalities – audio, speech, language, images, video, and documents – and for accessing this information interactively. The book has a clear application goal, which is the capture, automatic analysis, storage, and retrieval of multimodal signals from human interaction in meetings. This goal provides a controlled experimental framework and helps generating shared data, which is required for methods based on machine learning. This goal has shaped the vision of the contributors to the book and of many other researchers cited in it. It has also received significant long-term support through a series of projects, including the Swiss National Center of Competence in Research (NCCR) in Interactive Multimodal Information Management (IM2), to which the contributors to the book have been connected.


  • INTERACTIVE MULTIMODAL INFORMATION MANAGEMENT: SHAPING THE VISION – Meeting capture, analysis and access – The IM2 Swiss National Center of Competence in Research – Related international projects and consortia
  • Human-Computer Interaction and Human Factors
  • HUMAN FACTORS IN MULTIMODAL INFORMATION MANAGEMENT – Role of human factors – Prominent research topics in human factors – Methodological approach in human factors – Empirical studies – Discussion and implications
  • USER ATTENTION DURING MOBILE VIDEO CONSUMPTION – Modeling user behavior – Data acquisition experiment – Data processing and results – Conclusions
  • WIZARD OF OZ EVALUATIONS OF THE ARCHIVUS MEETING BROWSER – The Archivus meeting browser – Multimodal Wizard of Oz evaluation – Conclusions
  • DOCUMENT-CENTRIC AND MULTIMODAL MEETING ASSISTANTS – The Smart Meeting Minutes application – Document centric meeting browsing – Cross-meeting and ego-centric browsing – Multimodal user interfaces prototyping for online meeting assistants – The Communication Board application – Conclusion
  • SEMANTIC MEETING BROWSERS AND ASSISTANTS – The JFerret framework and browser – TQB: a transcript-based query and browsing interface – Evaluation of meeting browsers – Automatic meeting browsers and assistants – Conclusions and perspectives
  • MULTIMEDIA INFORMATION RETRIEVAL – Introduction – Multimedia information retrieval: from information to user satisfaction – Interaction log mining: from user satisfaction to improved information retrieval – Multimedia information retrieval in a wider context
  • Visual and Multimodal Analysis of Human Appearance and Behavior
  • FACE RECOGNITION FOR BIOMETRICS – Introduction – Face processing in a nutshell – From face detection to face recognition – Statistical generative models for face recognition – Cross-pollination to other problems – Open data and software – Conclusion and future work
  • FACIAL EXPRESSION ANALYSIS – Introduction and state-of-the-art – Recognizing action units – Modeling human perception of static facial expressions – Conclusion
  • SOFTWARE FOR AUTOMATIC GAZE AND FACE/OBJECT TRACKING – Gaze tracking – Face tracking in real environments – Application to autism spectrum disorder – Conclusion
  • LEARNING TO LEARN NEW MODELS OF HUMAN ACTIVITIES IN INDOOR SETTINGS – Introduction – Related work – Proposed approach – Activity tracking for unusual event detection – Knowledge transfer for unusual event learning – Experiments – Conclusion
  • NONVERBAL BEHAVIOR ANALYSIS – Introduction: a brief history of nonverbal behavior research in IM2 – VFOA recognition for communication analysis in meeting rooms and beyond – Social signal processing – Behavioral analysis of video blogging – Final remarks
  • MULTIMODAL BIOMETRIC PERSON RECOGNITION – Introduction – Biometric classification with quality measures – Modeling reliability with Bayesian networks – A-stack: biometric recognition in the score-age-quality classification space – Conclusions
  • MEDICAL IMAGE ANNOTATION – Introduction – Multiple cues for image annotation – Exploiting the hierarchical structure of data: condence-based opinion fusion – Facing the class imbalance problem: virtual examples – Experiments – Conclusions
  • Speech, Language, and Document Processing
  • SPEECH PROCESSING – Methods for automatic speech recognition – Front-end processing of speech – Posterior-based automatic speech recognition – The Juicer decoder – Conclusions
  • RESEARCH TRENDS IN SPEAKER DIARIZATION – Goals and applications of speaker diarization – A state-of-the-art speaker diarization system – Research problems in speaker diarization – Conclusions and perspectives
  • SPEAKER DIARIZATION OF LARGE CORPORA – Two-stage cross-meeting diarization – Speaker linking – Experimental results – Conclusions
  • LANGUAGE PROCESSING IN DIALOGUES – Objectives of language analysis in meetings – Dialogue acts – Discourse particles – Thematic episodes and hot spots – Semantic cross-modal alignment – Conclusion and perspectives
  • ONLINE HANDWRITING RECOGNITION – Introduction – Online word recognition – From word to text recognition – From text to documents – Conclusions
  • Online Handwriting Analysis and Recognition – Introduction – Database acquisition – Online mode detection – Online handwriting recognition – Writer identification – Conclusion
  • ANALYSIS OF PRINTED DOCUMENTS – Extracting and reorganizing digital content from printable documents – Tagging the information extracted from digital documents – Video recording alignment and the temporal dimension of printable documents – Aligning digital documents with audio recordings – From printable documents to cross-media alignment and indexing – Conclusion
  • IT WAS WORTH IT! ASSESSMENT OF THE IMPACT OF IM2 – Motivation and procedure – The assessment questions – Synthesis of the interviews – Conclusion
  • TECHNOLOGY TRANSFER: TURNING SCIENCE INTO PRODUCTS – Visual recognition on mobile devices: kooaba AG – Joining capture and webcast: Klewel SA – Business experience meets technology: KeyLemon SA – XED and Dolores for ebooks: sugarcube Information Technology Sarl – From speech to text: Koemei SA – Look me in the eye: Pomelo SARL – The Association for Interactive Multimodal Information Management – The International Create Challenge
  • CONCLUSION AND PERSPECTIVES – Looking back on the initial motivations of IM2 – Scientific achievements of IM2 – Structural achievements – Technology transfer achievements – Perspectives


Publisher: EPFL Press English Imprint

Collection: Communication Sciences

Published: 3 october 2013

Edition: 1st edition

Media: Book

Pages count Book: 372

Format (in mm) Book: 160 x 240

Weight (in grammes): 830

Language(s): English

EAN13 Book: 9782940222711

In the same collection
People also bought
Related topics
--:-- / --:--