There are many different methods for drawing samples from data, and the ideal one depends on the data set and situation. Sampling can be based on probability , an approach that uses random numbers that correspond to points in the data set . This approach ensures that there is no correlation between points that are chosen for the sample. Further variations in probability sampling include simple, stratified and systematic random sampling and multi-stage cluster sampling. Once generated, a sample can be used for predictive analytics . For example, a retail business might use data sampling to uncover patterns about customer behavior and predictive modeling to create more effective sales strategies.
Random sampling involves choosing respondents from the target population at random, to minimize bias in a representative sample. While this method is more expensive and requires more upfront information, the information yielded is typically of higher quality. Purposive sampling is more widely used, and occurs when the managers target individuals matching certain criteria for information extraction. Ideal interview candidates receive profiles. Although this leads to the potential of bias in the representative sample, the information is easier to collect, and the sampler has more control when creating the representative sample.
Choice-based sampling is one of the stratified sampling strategies. In choice-based sampling,  the data are stratified on the target and a sample is taken from each stratum so that the rare target class will be more represented in the sample. The model is then built on this biased sample . The effects of the input variables on the target are often estimated with more precision with the choice-based sample even when a smaller overall sample size is taken, compared to a random sample. The results usually must be adjusted to correct for the oversampling.