How can the delta-rs Python API be used to connect to and authenticate with Delta Lake on Azure Data Lake Storage Gen 2?

asked 2022-03-05 11:00:00 +0000

1 Answer

answered 2021-09-29 15:00:00 +0000

To connect to and authenticate with Delta Lake on Azure Data Lake Storage Gen 2 using the delta-rs Python API, you can follow these steps:

  1. Install the delta-rs Python package using pip:

    pip install delta-rs
  2. Import the DeltaTable class from the delta module:

    from delta import DeltaTable
  3. Create a DeltaTable instance, specifying the path to the Delta Lake table on ADLS Gen2:

    table = DeltaTable(path="adl://<storage-account-name><path-to-delta-lake-table>")
  4. Set the ADLS Gen2 credentials by setting the following environment variables:

    export AZURE_STORAGE_ACCOUNT=<storage-account-name>
    export AZURE_STORAGE_KEY=<storage-account-key>

    You can also set these variables programmatically using the os module:

    import os
    os.environ["AZURE_STORAGE_ACCOUNT"] = "<storage-account-name>"
    os.environ["AZURE_STORAGE_KEY"] = "<storage-account-key>"
  5. Use the DeltaTable instance to query, modify, or manipulate the Delta Lake table as needed:

    table.delete() # deletes the table
    table.vacuum() # cleans up the table by removing old versions
