Random sample

Random sample

The Random Sample operation generates a data-set containing a random sub-set of rows from the input data. This can be useful when you are working with very large data-sets, allowing you to work with a smaller sample of data while preparing and testing additional operations that need to be applied to the data. 

Options 

  • Records. The Records field allows you to select the number of records in the sample. The records are chosen at random, so each time you execute the operation you will obtain a different sample.

Example 

The Random sample operation can be useful when you are working with very large data-sets. You can use the Random sample operation to generate a small sample of the data.This is useful because some operations can take a long time to execute on large data-sets. By working with a smaller data-set you can create, configure and test additional operations that you want to apply to the data much more quickly.

In this example we are working with a fairly large data-set containing approximately 1,000,000 records. We want to use a combination of the Random sample operation and the Input switch operation to switch the data between a small sample of 1,000 records and the full data-set without having to reconnect our workflow. A configuration that allows us to do this is shown below.

 
If we want to change the data from the sample data to the full dataset we simply need to click on the switch in the Input switch operation.
 
 

 

More on Operations