I know very little about data frames, but at a glance they remind me a lot of... - Random

yosh, 2 months ago

I know very little about data frames, but at a glance they remind me a lot of differential dataflow? How would you articulate the differences between the two systems?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

daridrea, 2 months ago

@yosh data frames are a tabular data structure commonly used (for organizing and manipulating structured data) in programming languages like R and Python. they provide a high-level interface for performing operations on data, such as filtering, aggregating etc.

differential dataflow is a computational framework that allows for incremental computation and efficient updates to data. they can efficiently process and maintain non-trivial algorithms, such as social-graph analysis on changing data

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

xgranade, 2 months ago

@yosh As in Pandas or System.Data.Analysis.DataFrames?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yosh, 2 months ago

@xgranade I was looking at Pola.rs, which I believe is very similar to Pandas. I don’t know what System.Data.Analysis.DataFrames is?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

xgranade, 2 months ago

@yosh Oooh, I didn't realize there was a Rust implementation! Ah, System.Data.Analysis.DataFrames is a Pandas-like library for .NET, sorry.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yosh, 2 months ago

@xgranade oh hah, glad I got to share the good news! I fully expected you to already know about it :D

I’m looking at it, and it seems neat. But then I think back to what I’ve read about differential dataflow and im suddenly unsure how they differ?

From an API perspective they seem really similar too: https://github.com/TimelyDataflow/differential-dataflow

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

xgranade, 2 months ago

@yosh I sadly haven't had many Rust projects recently, so I've been a bit out of the loop; definitely reading up on it now, though. Anyway, if I were to take a rough stab, differential dataflow appears to be useful when the data varies but the analysis is fixed, while data frames are useful when the data is fixed and the analysis varies. That is, each focuses on exploring a different stage of data processing?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

xgranade, 2 months ago

@yosh (With the full caveat, of course, that the above is an initial take, somewhat informed, and likely an oversimplification.)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yosh, 2 months ago

@xgranade oh interesting, thank you for explaining! — To clarify: by “data varies” do you mean just the data contained within, or potentially also even the schema?

By “stage of data processing”, is a good way to interpret this that data frames might be most useful to arrive at a useful analysis, and differential dataflow is useful when you need to make that analysis perform well later on?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

xgranade, 2 months ago

@yosh I was meaning when the schema is fixed, yeah. And yeah, at least in the Python world, Pandas is quite often used in an exploratory sense, such that allowing schemas to be dynamic (though still strongly typed) is really important.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

yosh, 2 months ago

@xgranade I see! Ty!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hazelweakly, 2 months ago

@yosh @xgranade
it's also worth reading through this blog post

https://wesmckinney.com/blog/looking-back-15-years/

And eyeballing where various projects land on the "decomposed data landscape" (for lack of a better term). Often they're a different subset of the data landscape and overlap in 1-2 areas but not all of them (akin to what @xgranade was saying about one facet being fixed vs changing)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment