Ask Your Question
0

How can the number of columns in a PySpark dataframe be calculated using Python?

asked 2023-03-03 11:00:00 +0000

djk gravatar image

edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted
3

answered 2021-09-06 17:00:00 +0000

david gravatar image

The number of columns in a PySpark dataframe can be calculated using the .count() function in Python. Here's an example:

from pyspark.sql import SparkSession

# initialize Spark
spark = SparkSession.builder.appName("column_count").getOrCreate()

# create a PySpark dataframe
data = [("John", 25, "Male"), ("Jane", 30, "Female"), ("Bob", 20, "Male")]
df = spark.createDataFrame(data, ["Name", "Age", "Gender"])

# count the number of columns
num_columns = len(df.columns)

print("Number of columns in dataframe:", num_columns)

Output:

Number of columns in dataframe: 3
edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer


Question Tools

Stats

Asked: 2023-03-03 11:00:00 +0000

Seen: 12 times

Last updated: Sep 06 '21