Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Python and R can be used together for data manipulation in Databricks Notebook by leveraging the functionality of the databricks-connect library. Here are the steps to follow:

  1. First, install the databricks-connect library on your local machine using the command: pip install databricks-connect

  2. Next, setup databricks-connect by running the command: databricks-connect configure. This will prompt you to enter your Databricks URL and Personal Access Token.

  3. Once you have set up databricks-connect, you can connect to your Databricks workspace by running the command: databricks-connect test

  4. Now, you can use both Python and R in the same Databricks Notebook by specifying the language at the beginning of each cell using the %python or %r magic commands. For example:

    %python
    df = spark.read.csv("path/to/file")

    %r
    library(dplyr)
    df <- df %>% select(col1, col2)

    Note that you can use spark_read_csv() function from sparklyr package if you want to read .csv files using R.

  5. You can also pass data between Python and R by using the py and r variables. For example:

    %python
    py_var = "Hello from Python!"

    %r
    r_var <- paste(r_var, py$py_var)
    print(r_var)

    Note that py$ is used to access the Python variable py_var.

  6. Finally, you can also install R packages on your Databricks workspace by running the command: install.packages("package_name") within an R cell in the Databricks Notebook.