The sparklyr package and friends have been getting some important updates in the past few months!
sparklyr is a package that allows you to interact with Spark using familiar R interfaces, such as dplyr, broom, and DBI. You can also gain access to Spark's distributed Machine Learning libraries, Structure Streaming, and ML Pipelines from R.
After months of work and $10 million, Databricks has unveiled DBRX - the world's most potent publicly available open-source large language model.
DBRX outperforms open models like Meta's Llama 2 across benchmarks, even nearing the abilities of OpenAI's closed GPT-4. Novel architectural tweaks like a "mixture of experts" boosted DBRX's training efficiency by 30-50%.
With this release, users can log in to a given Databricks workspace when they start an RStudio or #VSCode session and interact directly with the clusters in that Workspace from their preferred environment.
Pysparklyr is the new extension to sparklyr that allows you to interact with #Spark & Databricks Connect. The new version has big user-facing updates that make working with #Databricks and #RStats together even easier.
We are thrilled to announce that the latest version of sparklyr is on CRAN. sparklyr is the popular and powerful #RStats interface for #Apache#Spark, including Spark clusters hosted in #Databricks.
Thanks to the new Spark Connect protocol, you can access Spark’s powerful distributed computing features from RStudio Desktop, a Posit Workbench instance, or any running R terminal or process.
out of curiousity has anyone seen data corruption issues with either Azure DevOps Repos (Git only) or #Databricks' Git provider. Specifically interested in during merge conflicts, where a file was changed and the change isn't reflected in Git history. Boosts appreciated #AzDO#azure#azuredevops#git#github