Version 1.2

Introduction

The SIGNUM Database was created within the framework of a research project at the Institute of Man–Machine Interaction, located at the RWTH Aachen University in Germany. The SIGNUM (Signer-Independent Continuous Sign Language Recognition for Large Vocabulary Using Subunit Models) project was funded by the Deutsche Forschungsgemeinschaft (German Research Foundation) and aimed to develop a video-based automatic sign language recognition system.

In order to ensure user-friendliness, the system utilizes a single color video camera for data acquisition. Since sign languages make use of manual and facial means of expression both channels are analyzed by means of image processing. The whole system, particularly the feature extraction and the subsequent classification stage, is designed for signer-independent operation and allows adaptation to an unknown signer. The reader interested in a more detailed description of this recognition system or an in-depth introduction to gesture and sign language recognition is directed to the publication list.

At the beginning of the project none of the sign language corpora found in literature met the requirements for signer-independent continuous sign language recognition. In contrast to speech recognition, there was actually no standardized benchmark. For this reason we decided to create a new database, which should be made available for other interested researchers after the project ends. We hope that the release of this database will boost research efforts in the fields of sign language recognition. Maybe it will become established as the first benchmark for signer-independent continuous sign language recognition.

Edition of the Bavarian Archive for Speech Signals (BAS)

For the BAS edition the corpus has been validated against BAS guidelines. The results are given in /doc/html/Revalidation_SIGNUM.html.

To ease automatic processing of SIGNUM contents the following ascii tables have been added to the corpus in the sub dir /doc/ascii (files are encoded in UTF-8 with CRLF as line terminators; all columns are TAB-separated):

To enable the SIGNUM corpus to be distributed online within CLARIN the directories containing the motion jpeg videos (data/sig*/per*/(dir-name)) have been replaced by ZIP files ((dir-name).zip) in the same location. These ZIPs contain all frame pictures without compression (to speed un-packing). Also simple TXT files (UTF-8) containing the annotations for each recording and the same file body name as the ZIP file were added.

To play a video extract the *.jpg files from a ZIP and use a player that is able to either convert motion JPEG into a compressed format (such as MP4) or use a player that is able to play motion JPEG directly, e.g. mplayer:

mplayer "mf://*.jpg" -mf fps=30