EDIT: I got a message from Databricks' employee that currently (DBR 15.4 LTS) the parameter marker syntax is not supported in this scenario. It might work in the future versions. Original question:...
2 Building on @camo's answer, since you're looking to use the secret value outside Databricks, you can use the Databricks Python SDK to fetch the bytes representation of the secret value, then decode and print locally (or on any compute resource outside of Databricks).
While Databricks manages the metadata for external tables, the actual data remains in the specified external location, providing flexibility and control over the data storage lifecycle. This setup allows users to leverage existing data storage infrastructure while utilizing Databricks' processing capabilities.
In Azure Databricks, if you want to create a cluster, you need to have the " Can Manage " permission. This permission basically lets you handle everything related to clusters, like making new ones and controlling existing ones.
You're correct about listed limitations. But when you're using Unity Catalog, especially with shared clusters, you need to think a bit differently than before. UC + shared clusters provide very good users isolation, not allowing to access data without necessary access control (DBFS doesn't have access control at all, and ADLS provides access control only on the file level). You will need to ...
2 Databricks allows to make SQL queries via an API using the databricks-sql-python package. There are then two ways of creating a connection object that can be put into a pd.read_sql_query(sql, con=connection). I'm wondering which one is better in terms of performance and reliability when doing SQL queries from pandas: Creating Python DB API 2. ...
CREATE OR REPLACE VIEW myview as select last_day(add_months(current_date(),-1)) Can someone let me know the equivalent of the above is in PySpark? Basically, I need to create a dataframe from the above Databricks SQL code, but it needs to written in PySpark
I'm setting up a job in the Databricks Workflow UI and I want to pass parameter value dynamically, like the current date (run_date), each time the job runs. In Azure Data Factory, I can use express...
Is databricks designed for such use cases or is a better approach to copy this table (gold layer) in an operational database such as azure sql db after the transformations are done in pyspark via databricks? What are the cons of this approach? One would be the databricks cluster should be up and running all time i.e. use interactive cluster.