Much of the existing software for supporting annotation was designed ad-hoc employing special purpose data representations. There are a number of tools which try to overcome this problem by employing XML as a representational formal for linguistic annotation.
A salient toolkit is NITE XML, a software that arose out of the European HLT-Project NITE (Natural Interactivity Tools Engineering). Although the NITE project itself finished in 2003, the software is now being maintained and further developed via SourceForge and is in use on a number of large distributed projects (e.g. it has been used for the AMI and ICSI Meeting Corpora as well as the Switchboard Corpus, the OASIS Dialogue Database, the Monitor Corpus and the Diagrams Corpus) as well as by individual researchers.
The NITE XML Toolkit provides open-source libraries to support heavily annotated corpora whether they are multimodal, textual, monologue or dialogue. Besides, its build in tools provide help for common tasks, which can be extended using their Java API. The toolkit also integrates a powerful query language and command line tools for data analysis.
Video lecture by Jean Carletta:
The NITE XML Toolkit meets the ICSI Meeting Corpus: import, annotation, and browsing
More information:
Contact person: Jonathan Kilgour (jonathan AT inf DOT ed DOT ac AND DOT uk)
NITE XML Toolkit Homepages: http://groups.inf.ed.ac.uk/nxt/index.shtml
Download:
The NITE XML Toolkit at SourceForge.net: http://sourceforge.net/projects/nite/files/
