yosh,
@yosh@toot.yosh.is avatar

I know very little about data frames, but at a glance they remind me a lot of differential dataflow? How would you articulate the differences between the two systems?

daridrea,
@daridrea@graphics.social avatar

@yosh data frames are a tabular data structure commonly used (for organizing and manipulating structured data) in programming languages like R and Python. they provide a high-level interface for performing operations on data, such as filtering, aggregating etc.

differential dataflow is a computational framework that allows for incremental computation and efficient updates to data. they can efficiently process and maintain non-trivial algorithms, such as social-graph analysis on changing data

xgranade,
@xgranade@wandering.shop avatar

@yosh As in Pandas or System.Data.Analysis.DataFrames?

yosh,
@yosh@toot.yosh.is avatar

@xgranade I was looking at Pola.rs, which I believe is very similar to Pandas. I don’t know what System.Data.Analysis.DataFrames is?

xgranade,
@xgranade@wandering.shop avatar

@yosh Oooh, I didn't realize there was a Rust implementation! Ah, System.Data.Analysis.DataFrames is a Pandas-like library for .NET, sorry.

yosh,
@yosh@toot.yosh.is avatar

@xgranade oh hah, glad I got to share the good news! I fully expected you to already know about it :D

I’m looking at it, and it seems neat. But then I think back to what I’ve read about differential dataflow and im suddenly unsure how they differ?

From an API perspective they seem really similar too: https://github.com/TimelyDataflow/differential-dataflow

xgranade,
@xgranade@wandering.shop avatar

@yosh I sadly haven't had many Rust projects recently, so I've been a bit out of the loop; definitely reading up on it now, though. Anyway, if I were to take a rough stab, differential dataflow appears to be useful when the data varies but the analysis is fixed, while data frames are useful when the data is fixed and the analysis varies. That is, each focuses on exploring a different stage of data processing?

xgranade,
@xgranade@wandering.shop avatar

@yosh (With the full caveat, of course, that the above is an initial take, somewhat informed, and likely an oversimplification.)

yosh,
@yosh@toot.yosh.is avatar

@xgranade oh interesting, thank you for explaining! — To clarify: by “data varies” do you mean just the data contained within, or potentially also even the schema?

By “stage of data processing”, is a good way to interpret this that data frames might be most useful to arrive at a useful analysis, and differential dataflow is useful when you need to make that analysis perform well later on?

xgranade,
@xgranade@wandering.shop avatar

@yosh I was meaning when the schema is fixed, yeah. And yeah, at least in the Python world, Pandas is quite often used in an exploratory sense, such that allowing schemas to be dynamic (though still strongly typed) is really important.

yosh,
@yosh@toot.yosh.is avatar

@xgranade I see! Ty!

hazelweakly,
@hazelweakly@hachyderm.io avatar

@yosh @xgranade
it's also worth reading through this blog post

https://wesmckinney.com/blog/looking-back-15-years/

And eyeballing where various projects land on the "decomposed data landscape" (for lack of a better term). Often they're a different subset of the data landscape and overlap in 1-2 areas but not all of them (akin to what @xgranade was saying about one facet being fixed vs changing)

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • mdbf
  • ngwrru68w68
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • osvaldo12
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • InstantRegret
  • tacticalgear
  • anitta
  • ethstaker
  • provamag3
  • cisconetworking
  • tester
  • GTA5RPClips
  • cubers
  • everett
  • modclub
  • megavids
  • normalnudes
  • Leos
  • JUstTest
  • lostlight
  • All magazines