How can complex data.table commands in R be made more efficient by potentially merging the "group by" function, lapply, and a column name vector?

asked 2023-06-26 15:29:12 +0000

pufferfish
41 ●3 ●2

edit retag flag offensive close merge delete

add a comment

1 Answer

Sort by » oldest newest most voted

answered 2023-06-26 15:44:01 +0000

devzero
51 ●1 ●4 ●4

One option to make complex data.table commands more efficient is to merge the "group by" function with lapply and a column name vector.

For example, let's say we have a data.table called "DT" with columns "A", "B", "C", and "D". We want to calculate the mean of columns "B", "C", and "D" for each unique value of "A".

Instead of doing:

DT[, .(mean_B = mean(B), mean_C = mean(C), mean_D = mean(D)), by = A]

We can create a vector of column names we want to operate on and use lapply inside the "j" argument of data.table to generate the desired columns, like this:

cols <- c("B", "C", "D")
DT[, lapply(.SD, mean), by = A, .SDcols = cols]

This can reduce code duplication (especially if we have more columns to operate on), potentially be faster by only iterating over the column names once, and make it easier to change the columns we operate on in the future.

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer

Question Tools

Stats

Asked: 2023-06-26 15:29:12 +0000

Seen: 8 times

Last updated: Jun 26 '23

How can complex data.table commands in R be made more efficient by potentially merging the "group by" function, lapply, and a column name vector? edit

1 Answer