You may use a truly random process (e.g. shuffled cards or dice) to produce random numbers. Use of a pseudo-random sequence, which can be generated by most programming languages, is easier.
Beware: We found that some programming languages actually generate the identical pseudo-random sequence every time the program or script is executed if the random number generator is not properly seeded. A good random number generator is for instance used in the gawk programming language.
The following example gawk script selects a random sequence of 40 session numbers from a corpus session range between 150 and 350. Since the random generator is seeded with the actual system time, it will generate a different sequence every new second. It also keeps track of the already selected numbers and will not produce the same session number twice:
BEGIN {
srand() # seeding the random number generator
i = 1
while(i<=40)
{
flag = 1
while ( flag == 1 )
{
random = int(rand() * 200) + 150
flag = 0
for ( j in randarr )
if ( randarr[j] == random ) flag = 1
}
randarr[i] = random
printf("%03d ",randarr[i])
i ++
}
printf("\n")
}
In most cases the selection process not only involves random
sequences but also a number of other constraints. For instance: equal
distribution between sexes, certain proportions of special features
within the corpus etc. There are several ways to implement such
constraints on a random selection. The brute force approach is to run the
random sequencer repeatedly until the resulting sample meets the
required constraints.
Document the resulting data sample and your method for creating it in the validation report.