Ask Your Question
3

How can complex data.table commands in R be made more efficient by potentially merging the "group by" function, lapply, and a column name vector?

asked 2023-06-26 15:29:12 +0000

pufferfish gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
3

answered 2023-06-26 15:44:01 +0000

devzero gravatar image

One option to make complex data.table commands more efficient is to merge the "group by" function with lapply and a column name vector.

For example, let's say we have a data.table called "DT" with columns "A", "B", "C", and "D". We want to calculate the mean of columns "B", "C", and "D" for each unique value of "A".

Instead of doing:

DT[, .(mean_B = mean(B), mean_C = mean(C), mean_D = mean(D)), by = A]

We can create a vector of column names we want to operate on and use lapply inside the "j" argument of data.table to generate the desired columns, like this:

cols <- c("B", "C", "D")
DT[, lapply(.SD, mean), by = A, .SDcols = cols]

This can reduce code duplication (especially if we have more columns to operate on), potentially be faster by only iterating over the column names once, and make it easier to change the columns we operate on in the future.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-06-26 15:29:12 +0000

Seen: 8 times

Last updated: Jun 26 '23