You can use PySpark or Python to modify the year column in your dataset as follows:
Assuming your PySpark dataframe is called "df" and the year column is called "year", you can use the "when" and "otherwise" functions to create a new column with the modified year values:
from pyspark.sql.functions import when
df = df.withColumn("new_year",
when(df.year == 2022, "2022-23")
.when(df.year == 2021, "2021-2022")
.otherwise(df.year))
This creates a new column called "new_year" that has the modified year values for 2021 and 2022, and leaves the other year values unchanged.
Assuming you have a Pandas dataframe called "df" and the year column is called "year", you can use the "apply" function and a lambda function to modify the year values:
df["new_year"] = df["year"].apply(lambda x: "2022-23" if x == 2022 else
("2021-2022" if x == 2021 else x))
This creates a new column called "new_year" that has the modified year values for 2021 and 2022, and leaves the other year values unchanged.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2023-07-18 02:19:09 +0000
Seen: 12 times
Last updated: Jul 18 '23
How do you log Python data into a database?
How can SQL/PLSQL blocks be stripped of their comments?
What is the method for programmatic access to a time series?
What is the process of using SQLAlchemy ORM and cloud spanner to read rows as model objects?
What is the method to retrieve the JSON data from a column in SQL?
How can I set up Gunicorn with a Django Project?
Looking for a Python Module that finds Tags for a Text describing its Content
Need a Function in Python to remove entries less than 2 digits from an Array