@nicksspirit giving a lightning talk ⚡ on "5 Things You Don't Know About #DuckDB" at #PyConUS. A great talk packed with cool features about this OLAP tool
The world of big data, databases, and R is rapidly evolving with an explosion of tools and packages. We're delighted to announce two workshops at posit::conf(2024) tailored for working with large datasets:
• Big Data in R with Arrow, led by Nic Crane and Steph Hazlitt
• Databases with R led by @kirill
We’re thrilled to announce dplyr powered by DuckDB: duckplyr 🎉
A collaboration between the dplyr project team at Posit, cynkra, and DuckDB, duckplyr is a powerful new option that marries the user-friendly dplyr syntax with the execution capabilities of DuckDB.
For day 7 ("hazards") of the #30DayChartChallenge we use #DuckDB to read in and wrangle the U.S. mass shootings Google Sheet data curated by Mother Jones to look at the distribution of the number of fatalaties per-incident from 1982-2023.
DuckDB released a new R package - duckplyr, which enables running dplyr functions using the DuckDB engine on the backend ❤️. The package, on the backend, translates and maps the dplyr code into DuckDB. This will enable dplyr users to work with large datasets with higher performance.
Motivated by the recent blog (https://duckdb.org/2024/04/02/duckplyr) I finally took {duckplyr} for a spin and 🤯. Staggeringly quick (though I only tried the example from the post). Definitely going to kick the tyres some more. {dplyr} on the front with #duckdb at the rear is the perfect example of the cool user experience you can create with #RStats (not forgetting to mention the years of experimenting / hard work from all involved). Big 😃 for me right now.
What I failed to mention in the Bonus Drop post last night is that I have an in-progress e-book on "Cooking With #DuckDB" up & posting new chapters regularly.
I'm messing around with #oauth2 on google. I want to do some of my own picture investigations.
I finally was able to retrieve 'mediaItems'. Now on to ingesting them into a little database... I think I'm going to go with #duckdb. Just because I need some experience with it.
Might as well use #sqlachemy so I can be reminded how much I dislike it.
@jamesog thanks for sharing this! I’m going to have to play with it. I have some complex #JSON cases that might really benefit. I also see promise in cases where today I might translate #XLSX to #CSV and then import into #SQLite. Why do all that if I can query the original directly? Awesome!