The process of programmatically generating data asset using Python code from a data store URI in Azure ML studio involves the following steps:
Import the necessary packages:
from azureml.core import Workspace, Datastore, Dataset
import pandas as pd
Initialize a Workspace object using the Azure ML SDK:
ws = Workspace.from_config()
Access the Datastore object using the data store URI:
datastore = Datastore.get(ws, datastore_name='<datastore_name>')
Read data from the Datastore using the Dataset object:
dataset = Dataset.Tabular.from_delimited_files(path=(datastore, '<dataset_file_path>'))
df = dataset.to_pandas_dataframe()
Manipulate the data as needed using pandas DataFrame operations:
df = df.drop(columns=['column_to_drop'])
df['new_column'] = df['column_a'] + df['column_b']
Write the updated data back to the Datastore using the Dataset object:
updated_dataset = Dataset.Tabular.register_pandas_dataframe(df, target=(datastore, '<updated_dataset_file_path>'), name='<dataset_name>', description='<description>')
The updated Dataset can be used further by passing it as input to the Azure ML training pipeline or other workflows.
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2023-01-25 11:00:00 +0000
Seen: 9 times
Last updated: Jul 24 '22
How can popen() be used to direct streaming data to TAR?
In Python, can a string be utilized to retrieve a dataframe that has the same name as the string?
What is the method for merging field value and text into a singular line for display?
What is the method for programmatic access to a time series?