gavin

@gavin@fosstodon.org

Data Scientist 📈 Ecologist 🦎🐍Tennis 🎾 Endlessly curious 🇿🇦

This profile is from a federated server and may be incomplete. Browse more on the original instance.

stevensanderson, to Glue
@stevensanderson@mstdn.social avatar

Some one just reached out looking to extract values from a cell that are produced by from the

Example a cell value like 251 (13%) they just want the 251, so I did something like this:

library(tidyverse)
tibble(
value = glue::glue("{11:20} ({1:10}%)"),
reged_val = str_extract(value, "\d+(?=\W|$)") |>
as.numeric()
)

A tibble: 10 × 2

value reged_val
<glue> <dbl>
1 11 (1%) 11

Thoughts?

#R

gavin,

@stevensanderson

I would use the separate_wider_* functions in {tidyr}

For example:
library(tidyverse)
tibble(
value = glue::glue("{11:20} ({1:10}%)"),
reged_val = str_extract(value, "\d+(?=\W|$)") %>%
as.numeric()
) %>%
separate_wider_delim(
value,
names = c("value_separated", NA),
delim = " ",
cols_remove = FALSE
)

rabaath, to random

On my blog: Why pandas feels clunky when coming from R

https://www.sumsar.net/blog/pandas-feels-clunky-when-coming-from-r/

gavin,

@brodriguesco @jimgar @rabaath @matthewbadger I have used pointblank for type checking dataframes in data pipelines. The API is fantastic for in-line and schema-driven testing ('agent-based'). You can use strict assertive testing or threshold-based failure when data quality cannot be guaranteed or is not mission critical.

gavin,

@brodriguesco @jimgar @rabaath @matthewbadger

Hot off the press:
The {pointblank} package has once again protected my client's data pipeline from potentially problematic inputs to the database.

gavin,

@jimgar @brodriguesco @rabaath @matthewbadger Not these functions. In the Validation workflow, the idea is that the functions are running in an automated job. In this scenario, the failure of the job is the necessary outcome.

(I reran the function for the purpose of demonstration - my log message was not displayed alongside the code trace in a way that would have been clear.)

The Data Quality workflow with agents does allow you to view the details of all failing unit tests for all checks.

gavin,

@jimgar @brodriguesco @rabaath @matthewbadger
'Agents' and their 'reports' are incredible. All failing unit tests are accessible via the CSV link (when the report is viewed in html). In an interactive setting, the dataframes containing the failing unit tests can be extracted as a whole or individually from the agent object.
https://rstudio.github.io/pointblank/articles/VALID-I.html

gavin,

@danwwilson @jimgar @brodriguesco @rabaath @matthewbadger Currently in a script. I am interested in knowing more about how it works in {targets}, though. I assume it can be used to interrupt the sequence of calls, if say, an input to one of the steps has been changed in an incorrect fashion.

gavin, to random

@brodriguesco I had reason to revisit this old post of yours, and I noticed that the pivotal plot in this post seems to be missing. Is it a render issue?
https://www.brodrigues.co/blog/2022-12-21-longevity/

btp, to random
@btp@fosstodon.org avatar

"Works on my machine™️"

gavin,

@btp Real Talk:

Sometimes it doesn't...

gvwilson, to random
@gvwilson@mastodon.social avatar

Started outlining a lesson on system administration for data scientists, and it's going to be harder to build than the SQL tutorial (https://gvwilson.github.io/sql-tutorial/) because:

  1. It needs to be horizontal rather than vertical, i.e., needs to go wide across many tools instead of deep into one. That means each tool will only get shallow coverage, but some tools aren't useful or compelling until learners reach a certain depth. 1/
gavin,

@gvwilson Got a link?

Recently, I have been wondering if there is consensus... 🤔

brodriguesco, to random
@brodriguesco@fosstodon.org avatar

New #RStats x #Nix blog post: Reproducible data science with Nix, part 9 -- rix is looking for testers!

https://www.brodrigues.co/blog/2024-02-02-nix_for_r_part_9/

Check out the latest feature implemented by @specphil , with_nix(), it'll blow your mind!

gavin,

@specphil @brodriguesco I have tested the with_nix() function on Windows 10 in WSL2.

Screenshots of successful execution are attached.

image/png

gavin,

@brodriguesco @specphil I have been learning nix and I had it pre-installed.

I have worked with nix-shell, nix-repl and other nix functions - but I have had limited success building derivations. I have read Nix Pills and other nix documentation

I am going through your vignettes at the moment. I am hoping to mix my theoretical nix knowledge with the management of nix that {rix} provides.

Ultimately, I am looking to replace the renv-Docker workflow in all my projects. 😃

brodriguesco, to random
@brodriguesco@fosstodon.org avatar

Has any of you Linux #rstats user had a Linux command segfault if launched via system() or system2() but working fine by itself outside of R?

gavin,
rOpenSci, to rstats Spanish
@rOpenSci@hachyderm.io avatar

📦 [A package a day - Taxonomy 6]

Today's Taxonomy package is taxadb

A High-Performance Local Taxonomic Database Interface
🙏 Maintained by @cboettig
📝 https://docs.ropensci.org/taxadb/

Look how @gavin uses {taxadb} to query taxonomic information in ecology projects
🗺️ https://discuss.ropensci.org/t/using-taxadb-to-query-taxonomic-information-in-ecology-projects/2046




@rstats

gavin,

@rOpenSci @cboettig @rstats I see that the URL to my (now outdated post) has not been updated since I migrated my site to my new domain.

I will have to see if I can resolve that when I am on leave. 🙂

gavin,

@steffilazerte @rOpenSci @cboettig @rstats I need to add the post.

gavin, to random

Hi

Does anyone have any demos of how to render 3-D bar plots in the polygons of a map plot?

I feel like I saw a plot like this once in , but it might have been on the other site.

gavin,

@nrennie Thank you, Nicola.
I will try those out.

njtierney, to random
@njtierney@aus.social avatar

New post: "How to get good with R", where I ramble on some ideas on getting better with R - keen to hear what people think I've missed and discuss the topic!

https://www.njtierney.com/post/2023/11/10/how-to-get-good-with-r/

#rstats

gavin,

@njtierney This is a great post. It arrived at the perfect time for me, too.

Yesterday, after reading your post, I was inspired to use browser() for the first time. I am honestly not sure why I hadn't used it before... :blobcatgooglyshrug: It was great to use and helped me create several useful abstractions in some custom functions.

I feel I have improved significantly by:

  1. Writing functions
  2. Reading other people's source code
  3. Refactoring my functions with what I have learned.
brodriguesco, to random
@brodriguesco@fosstodon.org avatar

I need a little help with something: If you're running inside wsl2 on windows, what does Sys.info() and R.version$os return?

gavin,

@brodriguesco Same as Luke's answer:

brodriguesco, to python
@brodriguesco@fosstodon.org avatar

My 2 cents regarding #Python in #Excel #PythonInExcel

gavin,

@brodriguesco @eliocamp Talk about a new vendor lock-in approach 🤪

brodriguesco, to random
@brodriguesco@fosstodon.org avatar

New #RStats X #Nix blog post: Reproducible data science with Nix, part 4 -- So long, {renv} and #Docker, and thanks for all the fish

https://www.brodrigues.co/blog/2023-08-12-nix_for_r_part4/

gavin,

@brodriguesco Finally, we learn that Bruno is learning Nix because he doesn't want to switch from Hugo to Quarto despite all the breaking changes that have beaten the rest of us. 😆

gavin,

@brodriguesco 😉 I'm glad you do. I started testing Nix in Docker on WSL today. I will let you know if I learn anything interesting.

stevensanderson, to random
@stevensanderson@mstdn.social avatar

Imagine you have a bunch of data points and you want to know how many belong to different categories. This is where grouped counting comes in. We've got three fantastic methods for you to explore, each with its own flair: aggregate(), dplyr, and data.table.

Happy counting, fellow data explorer! 🎉🔍 #r

Post: https://www.spsanderson.com/steveondata/posts/2023-08-10/

image/png
image/png

gavin,

@stevensanderson Nice demo!

One additional option since dplyr 1.1 is to use the .by argument to summarise directly:

b0rk, (edited ) to random
@b0rk@jvns.ca avatar

if you just stopped being scared of the command line in the last year or three — what helped you?

(no need to reply if you don’t remember, or if you’ve been using the command line comfortably for 15 years — this question isn’t for you :) )

gavin,

@b0rk I keep myself motivated with this mental model:

  • a mouse click is just a GUI-specific way of executing some CLI function (or calling an API if web-based).
  • learning the call to the CLI removes the requiremnt that I must relearn my workflow if the GUI changes. (Yes, I'm looking at you, GitLab/Asana et al.)

Practically:

  • I always read the --help page on commands I use. Gives me a feel for flexibility.
  • This has taught me to expect that the CLI commands give me a lot of flexibility.
RossGayler, to random
@RossGayler@aus.social avatar

@brodriguesco I just cast a quick (and very inexpert) eye over your https://github.com/b-rodrigues/rix repo. I see the claim that "With Nix, it is essentially possible to replace {renv} and Docker combined."

What level of dependency can't be addressed by Nix? Obviously not hardware. What about OS version?

#rstats #Nix #reproducible

gavin,

@brodriguesco @RossGayler When using Windows, doesn't Nix require WSL? Can nix-build be called from Powershell/Command? Or only from a Linux terminal?

gavin,

@brodriguesco @RossGayler If nix-build is exclusively a Linux CLI, then I am curious as to whether the systemd update to WSL (2022H2) makes the use of WSL2 simpler for nix environments in the same way it made WSL2 a much more user-friendly experience in general.

Will test this when I get a chance.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • InstantRegret
  • mdbf
  • ethstaker
  • magazineikmin
  • GTA5RPClips
  • rosin
  • thenastyranch
  • Youngstown
  • osvaldo12
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • JUstTest
  • Durango
  • everett
  • tester
  • cisconetworking
  • Leos
  • cubers
  • modclub
  • ngwrru68w68
  • tacticalgear
  • anitta
  • provamag3
  • normalnudes
  • lostlight
  • All magazines