Understanding 'Head' and '2000' in Microsoft Machine Learning Studio's 'Partition and Sample' Module

In the Microsoft Machine Learning Studio, the 'Partition and Sample' module is used to split a dataset into two or more subsets for training and testing purposes. The 'Properties' of this module include two important parameters: 'Head' and '2000'.

The 'Head' parameter specifies the number of records to be included in the output dataset. It is used to limit the size of the dataset and reduce the processing time. For example, if the original dataset contains 10,000 records and the 'Head' parameter is set to 100, the output dataset will contain only the first 100 records of the original dataset.

The '2000' parameter specifies the seed value for the random number generator used to sample the data. It is used to ensure that the same random sample is generated every time the module is executed. For example, if the '2000' parameter is set to 123, the same random sample will be generated every time the module is executed with the same input dataset.

In summary, the 'Head' parameter limits the size of the output dataset, while the '2000' parameter ensures that the same random sample is generated every time the module is executed.

Understanding 'Head' and '2000' in Microsoft Machine Learning Studio's 'Partition and Sample' Module