Malcolm's IBM Almaden Home Page

I am now working at IBM's Almaden Research Center. I'm just now getting into the swing of things. See below for some of my IBM work.

Much of my professional and personal lives are currently documented elsewhere.


Sensemaking

I'm now working on sensemaking--the science of  understanding how people understand the world around them.
  • For our first work in this area, we studied how hard it was for people to make sense of a large collection of documents using paper and two different computer-based visualizations.
  • In the second work, we looked at the specific costs involved in understanding collections.
  • In the third work, we built a specific tool for understanding streaming collections of data.
Understanding Large Document Collections (HICSS 2005)

Cost Analysis of Sensemaking (IA2005)

Streaming Analytical Worksheets (Interact 2005)

Being literate with large document collections (HICSS 2006)



Timbre

I'm working with a talented Stanford student, Hiroko Terasawa, to understand and characterize how people perceive timbre. We hope to build a model of timbre that explains human timbre perception (and by extension speech) as well as the three-color model explains color vision. First experiment (ICAD 2005)

Applied to speech (Interspeech 2005)

Best audio summary (Workshop on Applications of Signal Processing to Audio and Acoustics - Mohonk 2005)



Speech-Music Discrimination

I work with a bunch of very smart people at the Telluride Neuromorphic Engineering Workshop.  I work with the audio group and in 2003 two of the students did some very nice work on audio classification using novel approaches. Nima Mesgarani used cortical spatial-temporal response fields and a tensor SVD to get the best performance to date on a speech-music discrimination task.  Sourabh Ravindran looked at a more conventional approach, but optimized to run at low power levels in a sensor network.
Tensor SVD Approach

Low-Power Sensor Approach


User Modeling

I've recently started working on user modeling.  How do users make sense of their world? This paper, presented at the User Modeling 2003 meeting, talks about how to figure out the different tasks that users perform as they go about their work.  It applies the expectation-maximization (EM) algorithm to segment and cluster time-oriented text data.
Modeling Multitasking Users Paper


Video Mining

I wrote a book chapter, describing our work on text tools for video mining. This chapter summarizes the work on semantic-audio retrieval and multimedia segmentation, both described below.
Understanding the Semantics of Media book chapter


Semantic-Audio Retrieval

In the summer of 2001, there was a lot of interest in the TREC video retrieval task. I did this work to connect audio and a description of the audio for non-speech audio.


My first attempt, using a winner-take-all approach, was published at ICASSP. A better approach, based on something I called "mixture of probability efforts" will be published at ICME. ICASSP SAR Paper

ICME (MPESAR) paper


Segmentation Work

When I started at Almaden, a friend said that she didn't know the best way to to create a video table of contents. I said I did, which was mostly true, and this is the result.


Our first attempts were aimed at pure text segmentation. A shorter ICASSP paper and a longer paper for the SIAM Text Mining Workshop were accepted for publication. SIAM Text Mining Workshop Paper

ICASSP paper

We later extended these ideas to audio and video signals. We presented the first of these ideas, and some information on the temporal correlations present in these different dimensions, at the ICCV Event 2001 workshop. The most complete version of our work was presented at the ACM Multimedia conference. ICCV Event 2001 Paper

ACM Multimedia Paper (and example movies.)


Prior Work

Here are several papers I published while at IBM based on work I started before I arrived at IBM.
The BabyEars project is a research effort to understand how people communicate emotional messages with speech.  Computers, to date, are good at recognizing words but they ignore the emotional content in a speech signal.  Babies, on the other hand, learn the emotional content in the speech they hear before they understand the words.  We wanted to bridge these two worlds by building statistical classifiers that understand emotional messages in speech.  We want machines to recognize the emotional messages in a speech signal as well as dogs do.

This project studied the emotional content in speech signals using speech spoken by parents to their infant kids. We looked at approval, attentional, prohibition messages and compared the properties of these infant-directed speech signals to "normal" speech between adults.  Our work ignored the words and concentrated on the prosodic features of the speech: the acoustic pitch, timing and loudness cues that we vary as we speak.

The key findings in this study are:
  • A small number of pitch, loudness and frequency-domain measures of the speech signal allow us to accurately classify emotional aspects of a speech signal.
  • Men and women use different acoustic cues to communicate emotional messages.  Our classifiers were able to more accurately understand the emotional message in mother's speech than in father's speech.
For more information about this project, see the following links.
   
Living on Earth Radio Show

New Scientist Article


Speech Communication Journal
Send email to malcolm@ieee.org requesting an electronic reprint of the BabyEars journal article.

This is joint work with Gerald McRoberts at the Haskins Laboratory


Michele Covell and I did some neat work called FastMPEG on an algorithm to time-compress audio files that have been bit-compressed (ala MPEG) without first decompressing them. As far as we know, nobody else has done this much processing on compressed MPEG audio signals. ICASSP Paper and audio examples



I created a system which measures the synchronization between a speech signal and a talking face. This work, FaceSync, was published at NIPS 2000. NIPS Paper



IEEE published a book I wrote with my thesis advisor, Principle of Computerized Tomographic Imaging. We sold several thousand copies through IEEE and it was in print for 14 years. IEEE decided not to reprint it. Even better, SIAM decided to include it in their book series Classics in Applied Mathematics. It is now back in print. Online copy of the book

Order your copy now!



Steve Greenberg (ICSI at Berkeley) and I organized a NATO Advanced Studies Institute on Computational Models of Hearing. We are currently putting the finishing touches on a book that summarizes the work of our "students." Order your copy now


Last update: June 22, 2005 Send email to me at malcolm@ieee.org

My address is

Malcolm Slaney
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120