Ask Your Question
3

What is an efficient way to determine the number of Null and Nan values in each column of a PySpark dataframe?

asked 2023-06-23 07:12:12 +0000

qstack gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
3

answered 2023-06-23 07:22:01 +0000

lakamha gravatar image

One efficient way to determine the number of Null and Nan values in each column of a PySpark dataframe is to use the .isNull() and .isNaN() functions along with the .agg() function to calculate the sum of these values for each column.

Here is an example:

from pyspark.sql.functions import isNull, isNaN, col

# create a PySpark dataframe
df = spark.createDataFrame([(1, 2, None), (4, None, float('nan')), (7, 8, 9)], ['col1', 'col2', 'col3'])

# calculate the number of Null and Nan values for each column
null_counts = df.agg(*(sum(isNull(col(c)).cast('int')).alias(c) for c in df.columns))
nan_counts = df.agg(*(sum(isNaN(col(c)).cast('int')).alias(c) for c in df.columns))

# print the results
null_counts.show()
nan_counts.show()

This will output:

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   0|   1|   1|
+----+----+----+

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   0|   1|   1|
+----+----+----+

In this example, we first create a PySpark dataframe with some Null and Nan values. Then we use the agg() function with a list comprehension to calculate the sum of Null and Nan values for each column. We use the isNull() and isNaN() functions to determine if a value in each column is Null or Nan, respectively, and then use the sum() function to add up the total number of Null and Nan values in each column. Finally, we use the alias() function to assign column names to the resulting sum column, and then show the resulting dataframes with the .show() method.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-06-23 07:12:12 +0000

Seen: 12 times

Last updated: Jun 23 '23