How can the number of columns in a PySpark dataframe be calculated using Python?

python
sql

asked 2023-03-03 11:00:00 +0000

djk
21 ●1 ●1

edit retag flag offensive close merge delete

add a comment

1 Answer

Sort by » oldest newest most voted

answered 2021-09-06 17:00:00 +0000

david
31 ●16 ●4

The number of columns in a PySpark dataframe can be calculated using the .count() function in Python. Here's an example:

from pyspark.sql import SparkSession

# initialize Spark
spark = SparkSession.builder.appName("column_count").getOrCreate()

# create a PySpark dataframe
data = [("John", 25, "Male"), ("Jane", 30, "Female"), ("Bob", 20, "Male")]
df = spark.createDataFrame(data, ["Name", "Age", "Gender"])

# count the number of columns
num_columns = len(df.columns)

print("Number of columns in dataframe:", num_columns)

Output:

Number of columns in dataframe: 3

edit flag offensive delete link

add a comment

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss

Add Answer

Question Tools

Stats

Asked: 2023-03-03 11:00:00 +0000

Seen: 12 times

Last updated: Sep 06 '21

How can the number of columns in a PySpark dataframe be calculated using Python? edit

1 Answer