Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

One way to create a random sample from a data frame with a greater likelihood of including values within a particular range for a certain variable is to use stratified sampling.

First, create a new variable in the data frame that indicates whether the value of the variable of interest falls within the desired range. For example, if we want to include values of a variable 'x' between 20 and 50, we can create a new variable 'x_range' as follows:

df$x_range <- ifelse(df$x >= 20 & df$x <= 50, "within_range", "outside_range") 

Next, use stratified sampling to select a random sample that includes a higher proportion of observations within the desired range. We can use the stratified function from the splitstackshape package to do this:

library(splitstackshape) set.seed(123) sample_size <- 100 df_sample <- stratified(df, group = "x_range", size = sample_size, method = "srswor") 

In this example, group specifies the new variable 'x_range' we created, size specifies the desired sample size, and method = "srswor" specifies simple random sampling without replacement within each stratum. This will give us a random sample that is more likely to include observations within the desired range for the variable of interest.