One way to apply a function that utilizes a subset of other rows to each row in R's data.table is by using the .SD (Subset of Data.table) argument. .SD is a data.table containing the Subset of Data for each group, excluding any columns used in grouping. This allows us to apply the function to a subset of rows for each group. Here's an example:
Suppose we have the following data.table:
library(data.table)
dt <- data.table(x = c(1,1,2,2,3,3), y = c(2,3,4,5,6,7))
dt
#> x y
#> 1: 1 2
#> 2: 1 3
#> 3: 2 4
#> 4: 2 5
#> 5: 3 6
#> 6: 3 7
We want to subtract the minimum y value for each group in x from each y value in that group. We can use .SD to subset the data.table for each group in x and apply the function:
dt[, y_diff := y - min(.SD$y), by = x]
dt
#> x y y_diff
#> 1: 1 2 0
#> 2: 1 3 1
#> 3: 2 4 0
#> 4: 2 5 1
#> 5: 3 6 0
#> 6: 3 7 1
In this example, .SD$y returns the subset of the y column for each group in x, and we apply the min() function to get the minimum y value for that group. We then subtract this minimum value from each y value in that group to get the desired result. The new y_diff column is added to the original data.table.
Asked: 2023-05-14 08:18:31 +0000
Seen: 8 times
Last updated: May 14 '23