1 | initial version |
You can use PySpark or Python to modify the year column in your dataset as follows:
Assuming your PySpark dataframe is called "df" and the year column is called "year", you can use the "when" and "otherwise" functions to create a new column with the modified year values:
from pyspark.sql.functions import when
df = df.withColumn("new_year",
when(df.year == 2022, "2022-23")
.when(df.year == 2021, "2021-2022")
.otherwise(df.year))
This creates a new column called "new_year" that has the modified year values for 2021 and 2022, and leaves the other year values unchanged.
Assuming you have a Pandas dataframe called "df" and the year column is called "year", you can use the "apply" function and a lambda function to modify the year values:
df["new_year"] = df["year"].apply(lambda x: "2022-23" if x == 2022 else
("2021-2022" if x == 2021 else x))
This creates a new column called "new_year" that has the modified year values for 2021 and 2022, and leaves the other year values unchanged.