One way to divide a combined column in a dataframe with certain null values into separate columns based on whitespace is to use the str.split()
method in Pandas.
Here's an example:
import pandas as pd
# create sample dataframe
df = pd.DataFrame({'combined_column': ['John Smith 25', 'Jane Doe 30', 'Joe Bloggs 40', 'Bill 45']})
# divide combined_column into separate columns based on whitespace
df[['First Name', 'Last Name', 'Age']] = df['combined_column'].str.split(expand=True)
# drop the original combined_column column
df.drop(columns=['combined_column'], inplace=True)
print(df)
Output:
First Name Last Name Age
0 John Smith 25
1 Jane Doe 30
2 Joe Bloggs 40
3 Bill 45 NaN
In this example, we used str.split()
to separate the combined_column
into three separate columns based on whitespace. We then assigned these new columns to the dataframe using double square brackets ([[...]]
) and the expand=True
argument. Finally, we dropped the original combined_column
column using the drop()
method.
Note that in the sample dataframe, the last row only contains two values, so creating the Age
column for that row resulted in a NaN
value.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2023-06-13 12:22:05 +0000
Seen: 12 times
Last updated: Jun 13 '23
How can popen() be used to direct streaming data to TAR?
In Python, can a string be utilized to retrieve a dataframe that has the same name as the string?
What is the method for merging field value and text into a singular line for display?
What is the method for programmatic access to a time series?