|
Coding schemes
Language resources
- Bavarian archive for speech signal corpora - Archive of spoken corpora, mainly in German.
- CSLU Corpora - Approximately 20 corpora, including OGI Stories corpus, Kids' speech.
- Danish dialogue project corpus - Transcriptions of human-machine dialogues (in Danish) on air ticket reservation.
- English Language Corpora and Corpus resources - This page lists centres and projects from which language corpora (chiefly English language) are readily available.
- European Languages and Resources Association (ELRA) - speech, written, terminological resources
- Linguistic Data Consortium - speech and text databases, lexicons, and other resources
- MICASE: Michigan Corpus of Academic Spoken English - Transcripts of academic speech events
- The AMI Meeting Corpus - 100 hours of multi-modal meeting data, annotated for a wide range of behaviours from simple gaze and gesture to named entities, dialogue acts, and summaries.
- The Christine corpus of spoken English - Cross-section of 1990s spontaneous speech from all British regions, social classes, etc. 40% of the eventual complete CHRISTINE Corpus is now ready and available for use.
- The COCONUT corpus - A collection of human-human computer-mediated dialogues. Includes both the raw corpus and the annotated corpus.
- The University of Helsinki Language Corpus Server - Computer corpora of more than 50 languages, including samples of minority languages and extensive corpora representing different text types. For use in research work and teaching.
- UCREL corpora - UCREL has several machine-readable corpora. Some corpora are held only as plain orthographic text, whilst others are held with several kinds of annotation.
Methods
- DISC - The DISC Best Practice Guide extends and specialises software engineering best practice to the particular purposes of dialogue engineering, that is, to the development and evaluation of spoken language dialogue systems.
- EAGLES - guidelines covering aspects of text corpora, computational lexicons, evaluation of natural language processing systems, computational linguistic formalisms, and spoken language systems.
- TEI - Text Encoding Initiative. The project has developed guidelines for the encoding and transcription of linguistic corpora.
Projects and initiatives
- ATLAS - Architecture and Tools for Linguistic Analysis Systems. The project aims to investigate the creation of tools and formats to address the needs of flexible and extensible annotation formats and tools.
- ISLE - International Standard for Language Engineering. The project aims to develop, disseminate and promote de facto HLT standards and guidelines for language resources, tools and products.
- Linguistic Annotation
- Responsive Virtual Human Technology - NSF-funded project exploring spoken dialog with synthetic characters for training applications. Research Triangle Institute, Duke University, University of Tennessee, University of Colorado
- TalkBank - The project aims to provide standards and tools for creating, searching, and publishing primary materials via networked computers.
References
Regular events
Spoken Language Dialogue Systems
Systems and interfaces
- CommandTalk Leathernet - SRI's spoken language interface to the Marine Corp Leathernet System
- Eucalyptus - it's a natural language (NL) interface to the KOALAS E2 air combat computer simulation developed at the Naval Air Systems Command.
- NL-Soar - "L-Soar is a system built using the Soar general cognitive architecture and aims to provide incremental real-time NL capabilities. These NL Dialogue, comprehension and generation capabilities can be integrated into task agents to provide them with the abi
Tools
- Apple Pie Parser - A bottom-up probabilistic chart parser which finds the parse tree with the best score by best-first search algorithm.
- CODIAL - a tool in support of cooperative dialogue design
- Corpus analysis tools - BNC listing of corpus analysis tools.
- CSLU Toolkit - A suite of tools to enable exploration, learning, and research into speech and human-computer interaction.
- DialogueView - A tool for annotating speech repairs, utterance boundaries, utterance tags, and hierarchical discourse structure
- DRI Shared Tools and Resources
- Natural language generation technology - SIGGEN's list of natural language generation systems and software available on the web.
- The NITE XML Toolkit - software for annotating textual, spoken, or multimodal corpora, including both configurable end user tools and libraries for use in custom applications.
- Transcriber - a tool for assisting the creation of speech corpora. It allows to
manually segment, label and transcribe speech signals for later use in
automatic speech processing.
- TrindiKit - a toolkit for building and experimenting with
dialogue move engines and information states
|