Assuming that the file names are in a column of a DataFrame, you can use the string methods of Pandas to extract a portion of the file name and save it to a new column. For example, if you want to extract the date from a file name that is in the format "yyyymmdd_filename.ext":
import pandas as pd
# Create a dataframe with file names
df = pd.DataFrame({'file_name': ['20211010_file1.txt', '20211011_file2.csv', '20211012_file3.txt']})
# Use the str.extract method to extract the date from the file name
df['date'] = df['file_name'].str.extract('(\d{8})')
# Output the result
print(df)
This will create a new column called "date" in the DataFrame with the extracted date from the file name. The regular expression "\d{8}" matches any eight digits in a row, which corresponds to the format of the date in the file name. You can adjust the regular expression to match the specific format of your file names.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2022-08-09 11:00:00 +0000
Seen: 10 times
Last updated: Mar 06 '23
How can we bring googlesheets data into a pyspark dataframe?
How can we perform aggregate functions on particular datetime values in a Pandas DataFrame?
How can you display a Pandas Dataframe using a for loop?
How do you update a dataframe within a for loop in R after passing a list?
What is the method for computing the overall sum of a dataframe, excluding a singular row?
How can the list within a dataframe be transformed so that it becomes a binary data type?
What is the method for making a struct in a Spark dataframe less complex?