Skip to main content

Looking at Meetings

There are now three major corpora of meetings.

The earliest corpus is the ICSI meeting corpus. This corpus comprises 40 hours of recorded speech from meetings. The speech has been transcribed. This data is available from the LDC.

The AMI meeting corpus is a multimodal meeting corpus. Participant interactions were recorded using microphones, video cameras and electronic pens. There are about 100 hours of interactions in this corpus, which was funded by the European Union and is available from ELRA. Annotation tools are also available.

Meeting data has also been collected at Stanford and at CMU as part of the CALO project. This data is not yet publicly available.