next up previous contents
Next: Readability, Empty Files Up: Automatic Validation of Data Previous: Completeness   Contents


$\bigcirc$ File Names

Have the found files the correct file name? Are there mismatches between signal files and annotation files?



For the following example script assume that the signal files are of type WAV and stored in groups of 182 each in subdirectories under the main directory data. Each subdirectory contains the data of one recording session (001-345) coded into the name of the dir (SESnum) as well as into the file name of the signal files (SESnum_item.wav). Corresponding annotation files of type PAR and AGS are stored in the same structure but under the main directory annot. Furthermore there has to be a recording protocol (SESnum.rpr) in the directory meta/rpr.

#
# Check for completeness and superfluous files
# 

set sesssioncnt = 345
set signalcnt = 182
set datamain = /cdrom/data
set annotmain = /cdrom/annot
set metarpr = /cdrom/meta/rpr

...

# collect data
cd $datamain
set sessions = 0
set totaldirs = `ls -a | wc -l`
@ totaldir -= 2
foreach ses ( SES[0-9][0-9][0-9] )
  if ( ! -d $annotmain/$ses ) then 
    echo "ERROR: missing annotation dir $annotmain/$ses"
    set checkannot = 0
    set totalfilesannot = 0
  else
    set checkannot = 1
    set totalfilesannot = `ls -a $annotmain/$ses | wc -l`
    @ totalfilesannot -= 2
  endif
  if ( ! -e $metarpr/$ses.rpr ) then 
    echo "ERROR: missing meta data file $metarpr/$ses.rpr"
  endif
  set files = 0
  set totalfiles = `ls -a | wc -l`
  @ totalfiles -= 2
  foreach file ( $ses/$ses_[0-9][0-9][0-9].wav )
    set basename = ${file:t}
    set basename = ${basename:r}
    if ( $checkannot == 1 ) then 
      if ( ! -e $annotmain/$ses/$basename.par ) then 
        echo "ERROR: missing annotation file PAR for $file
      else
        @ totalfilesannot --
      endif
      if ( ! -e $annotmain/$ses/$basename.ags ) then 
        echo "ERROR: missing annotation file AGS for $file
      else
        @ totalfilesannot --
      endif
      #
      # Add here: Other checks on the annotation files
      #
    endif
    #
    # Add here: Other checks on the signal file
    #
    @ files ++
  end
  if ( $files != $signalcnt ) then 
    echo "ERROR: number of signal files in session \
        $ses ($files) not equal $signalcnt"
  else if ( $totalfiles > $files ) then 
    echo "ERROR: superfluous or wrongly named files \
      in $datamain/$ses"
  endif
  if ( $totalfilesannot > 0 ) then
    echo "ERROR: superfluous or wrongly named files \
      in $annotmain/$ses" 
  endif
  @ sessions ++
end
if ( $sessions != $sessioncnt ) then 
  echo "ERROR: number of recording sessions ($sessions) \
      not equal $sessioncnt"
else if ( $totaldirs > $sessions ) then 
  echo "ERROR: superfluous or wrongly named \
    directory in $datamain"
endif



Angela Baumann 2004-06-03