If the row contains NaN and arrays, you can calculate the average value of the non-NaN elements in the array using numpy's nanmean()
function. Here's an example code:
import pandas as pd
import numpy as np
# create example dataframe with NaN and array elements
df = pd.DataFrame({
'A': [1, 2, 3, np.nan],
'B': [[1, 2], [3, 4], [5, np.nan], [6, 7]],
'C': [np.nan, [1, 2, 3], [4, 5], [6, 7, 8, 9]]
})
# function to calculate average of non-NaN elements in array
def avg_without_nan(arr):
return np.nanmean(np.array(arr))
# apply function to dataframe columns with arrays
df['B_avg'] = df['B'].apply(avg_without_nan)
df['C_avg'] = df['C'].apply(avg_without_nan)
# calculate average of non-NaN elements in dataframe row
df['row_avg'] = df.iloc[:, :3].mean(axis=1, skipna=True)
print(df)
Output:
A B C B_avg C_avg row_avg
0 1.0 [1, 2] NaN 1.500000 NaN 1.0
1 2.0 [3, 4] [1, 2, 3] 3.500000 2.0 2.0
2 3.0 [5.0, nan] [4, 5] 5.000000 4.5 4.5
3 NaN [6, 7.0] [6, 7, 8, 9] 6.500000 7.5 6.5
In this example, the avg_without_nan()
function is applied to the 'B' and 'C' columns using the apply()
method, and the resulting averages are stored in new columns 'Bavg' and 'Cavg'. The average value of the row is calculated using the mean()
method on the first three columns of the dataframe, and ignoring NaN values with skipna=True
.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2021-09-13 11:00:00 +0000
Seen: 10 times
Last updated: Nov 13 '21
How can you display a Pandas Dataframe using a for loop?
What changes need to be made to this for loop to be suitable for use with Pandas, Modin, or Ray?
What does it mean when my time-series code (pandas) shows a ValueError for freq T?
How can a single-column be made from a pandas data frame that has n columns?
How can I multiply random elements in my dataframe?
What steps do I need to take to present a pandas DataFrame on a Django template?
How can I create a pandas dataframe using several OR operators in the statement?
How can the date format of dd/mm/yyyy be set in Pandas?
What is the method for producing a stacked n-series plot using Pandas and Matplotlib?