The sparklyr package and friends have been getting some important updates in the past few months!
sparklyr is a package that allows you to interact with Spark using familiar R interfaces, such as dplyr, broom, and DBI. You can also gain access to Spark's distributed Machine Learning libraries, Structure Streaming, and ML Pipelines from R.
After months of work and $10 million, Databricks has unveiled DBRX - the world's most potent publicly available open-source large language model.
DBRX outperforms open models like Meta's Llama 2 across benchmarks, even nearing the abilities of OpenAI's closed GPT-4. Novel architectural tweaks like a "mixture of experts" boosted DBRX's training efficiency by 30-50%.
With this release, users can log in to a given Databricks workspace when they start an RStudio or #VSCode session and interact directly with the clusters in that Workspace from their preferred environment.
Pysparklyr is the new extension to sparklyr that allows you to interact with #Spark & Databricks Connect. The new version has big user-facing updates that make working with #Databricks and #RStats together even easier.
We are thrilled to announce that the latest version of sparklyr is on CRAN. sparklyr is the popular and powerful #RStats interface for #Apache#Spark, including Spark clusters hosted in #Databricks.
Thanks to the new Spark Connect protocol, you can access Spark’s powerful distributed computing features from RStudio Desktop, a Posit Workbench instance, or any running R terminal or process.
out of curiousity has anyone seen data corruption issues with either Azure DevOps Repos (Git only) or #Databricks' Git provider. Specifically interested in during merge conflicts, where a file was changed and the change isn't reflected in Git history. Boosts appreciated #AzDO#azure#azuredevops#git#github
@kellogh it’s using an OpenAI model AFAIK. in my experience it’s produced more relevant output than vanilla gpt-3.5-turbo for PySpark and SQL generation. some use cases definitely work better than others. what type of tasks have you tried it for? curious what the issues were
it annoys me that it doesn’t know about previous messages in the chat so I have to repeat myself
explain error wouldn’t even rephrase info that was plainly visible in the error message, like column names. IIRC it was an AnalysisException in pyspark
To be fair it does do a great job with fixing code