You may use a truly random process (e.g. shuffled cards or dice) to produce random numbers. Use of a pseudo-random sequence, which can be generated by most programming languages, is easier.
Beware: We found that some programming languages actually generate the identical pseudo-random sequence every time the program or script is executed if the random number generator is not properly seeded. A good random number generator is for instance used in the gawk programming language.
The following example gawk script selects a random sequence of 40 session numbers from a corpus session range between 150 and 350. Since the random generator is seeded with the actual system time, it will generate a different sequence every new second. It also keeps track of the already selected numbers and will not produce the same session number twice:
BEGIN { srand() # seeding the random number generator i = 1 while(i<=40) { flag = 1 while ( flag == 1 ) { random = int(rand() * 200) + 150 flag = 0 for ( j in randarr ) if ( randarr[j] == random ) flag = 1 } randarr[i] = random printf("%03d ",randarr[i]) i ++ } printf("\n") }In most cases the selection process not only involves random sequences but also a number of other constraints. For instance: equal distribution between sexes, certain proportions of special features within the corpus etc. There are several ways to implement such constraints on a random selection. The brute force approach is to run the random sequencer repeatedly until the resulting sample meets the required constraints.
Document the resulting data sample and your method for creating it in the validation report.