Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

The method of importing documents into a Cosmos DB collection in PySpark without losing any data is:

  1. Create a Spark DataFrame with the data you want to import.

  2. Connect to your Cosmos DB account using the Cosmos DB Spark Connector.

  3. Use the DataFrameWriter class to write the DataFrame to your Cosmos DB collection.

  4. Use the "mode" option with the value "append" to make sure you don't lose any existing data in the collection.

Here is an example code snippet:

import pyspark.sql.functions as F
from azure.cosmosdb.spark import CosmosDBSparkConfig, CosmosDBSparkDataFrameBuilder

# Create a Spark DataFrame with the data you want to import
data = [("John", 25), ("Jane", 30)]
df = spark.createDataFrame(data, ["name", "age"])

# Connect to your Cosmos DB account using the Cosmos DB Spark Connector
config = CosmosDBSparkConfig(
    accountEndpoint="your-account-endpoint",
    accountKey="your-account-key",
    databaseName="your-database-name",
    collectionName="your-collection-name"
)

# Use the DataFrameWriter class to write the DataFrame to your Cosmos DB collection
df.write.format("com.microsoft.azure.cosmosdb.spark").mode("append").options(
    **config.config
).save()

# Query the collection to make sure your data was imported correctly
df_from_cosmos = CosmosDBSparkDataFrameBuilder(config).read()
df_from_cosmos.show()