Next: Safety / Verify /
Up: Distribution
Previous: Compression / Compatibility
Contents
Signal / Symbolic Data
As mentioned before it is a good concept to separate signal and symbolic
data in the corpus structure and henceforth also in the
distribution. In medium-sized or large speech corpora you
might not be able to keep all the data online at all times. However, you
will very likely
need to access the annotation data of the whole corpus. Therefore
it makes sense to store these data either on a separate medium or to
copy them on to all volumes of the speech corpus.
For example in the SmartKom corpus on every single DVD-R volume you will
find the complete set of annotation files for the total corpus. In the
WebCommand corpus these data are stored on a separate CDROM12.6.
Another advantage of keeping the symbolic information separate, is the fact
that these data are much more likely subject to updates than the signal
files. Since the symbolic data occupy usually less than 1% of the
total corpus size, these updates can be easily distributed to users via a
download from a FTP server.
Next: Safety / Verify /
Up: Distribution
Previous: Compression / Compatibility
Contents
BITS Projekt-Account
2004-06-01