stux @stux

**Dr Mircea Zloteanu** @mzloteanu@mastodon.social · 20h

Dr Mircea Zloteanu @mzloteanu@mastodon.social

#statstab #394 Difference-in-Differences Estimation

Thoughts: A bit of love for the python coders. DiD with lots of examples and estimators.

#did #python #guide #observational #TWFE #causalinference

https://py-econometrics.github.io/pyfixest/difference-in-differences.html#pointwise-vs-simultaneous-inference-in-event-studies

py-econometrics.github.ioDifference-in-Differences Estimation

**Joseph A di Paolantonio** @jadp@mastodon.social · 1d

Joseph A di Paolantonio @jadp@mastodon.social

It’s great to see causal Inference methods being used for this determination. Are better algorithms (than the near-far matching used) available that might be used in a judicial process causal digital twins to ameliorate these and other injustices in the future? Of course, getting rid of the bail system would make it moot.

#causalInference #causation #legal #justice

From: @hrdag
https://mastodon.social/@hrdag/114902611019230490

MastodonHRDAG (@hrdag@mastodon.social)How does wealth influence our court systems? @hrdag's findings reveal that those unable to post bail experience a 34% increased likelihood of being found guilty compared to those who secure pretrial freedom. https://hrdag.org/2025/02/17/bail/

**Dr Mircea Zloteanu** @mzloteanu@mastodon.social · 2d

Dr Mircea Zloteanu @mzloteanu@mastodon.social

#statstab #392 Statistically Efficient Ways to Quantify Added Predictive Value of New Measurements (forum thread)

Thoughts: Forums can be great for asking the author for exact answers to complex questions

#modelselection #causalinference #prediction #bias #information

https://discourse.datamethods.org/t/statistically-efficient-ways-to-quantify-added-predictive-value-of-new-measurements/2013/1

Datamethods Discussion Forum · Aug 22, 2019Statistically Efficient Ways to Quantify Added Predictive Value of New MeasurementsThis topic is for discussions about Statistically Efficient Ways to Quantify Added Predictive Value of New Measurements

Replied in thread

**Joe Roe** @joeroe@archaeo.social · 3d *

3d *

Joe Roe @joeroe@archaeo.social

@PCI_Archaeology Very cool! If I'm not mistaken, this is the first published application of #CausalInference / causal DAGs to archaeology?

**Aneesh Sathe** @aneeshsathe.com@aneeshsathe.com · 3d

Aneesh Sathe @aneeshsathe.com@aneeshsathe.com

My Road to Bayesian Stats

By 2015, I had heard of Bayesian Stats but didn’t bother to go deeper into it. After all, significance stars, and p-values worked fine. I started to explore Bayesian Statistics when considering small sample sizes in biological experiments. How much can you say when you are comparing means of 6 or even 60 observations? This is the nature work at the edge of knowledge. Not knowing what to expect is normal. Multiple possible routes to a seen a result is normal. Not knowing how to pick the route to the observed result is also normal. Yet, our statistics fails to capture this reality and the associated uncertainties. There must be a way I thought.

Free Curve to the Point: Accompanying Sound of Geometric Curves (1925) print in high resolution by Wassily Kandinsky. Original from The MET Museum. Digitally enhanced by rawpixel.

I started by searching for ways to overcome small sample sizes. There are minimum sample sizes recommended for t-tests. Thirty is an often quoted number with qualifiers. Bayesian stats does not have a minimum sample size. This had me intrigued. Surely, this can’t be a thing. But it is. Bayesian stats creates a mathematical model using your observations and then samples from that model to make comparisons. If you have any exposure to AI, you can think of this a bit like training an AI model. Of course the more data you have the better the model can be. But even with a little data we can make progress.

How do you say, there is something happening and it’s interesting, but we are only x% sure. Frequentist stats have no way through. All I knew was to apply the t-test and if there are “***” in the plot, I’m golden. That isn’t accurate though. Low p-values indicate the strength of evidence against the null hypothesis. Let’s take a minute to unpack that. The null hypothesis is that nothing is happening. If you have a control set and do a treatment on the other set, the null hypothesis says that there is no difference. So, a low p-value says that it is unlikely that the null hypothesis is true. But that does not imply that the alternative hypothesis is true. What’s worse is that there is no way for us to say that the control and experiment have no difference. We can’t accept the null hypothesis using p-values either.

Guess what? Bayes stats can do all those things. It can measure differences, accept and reject both null and alternative hypotheses, even communicate how uncertain we are (more on this later). All without making assumptions about our data.

It’s often overlooked, but frequentist analysis also requires the data to have certain properties like normality and equal variance. Biological processes have complex behavior and, unless observed, assuming normality and equal variance is perilous. The danger only goes up with small sample sizes. Again, Bayes requires you to make no assumptions about your data. Whatever shape the distribution is, so called outliers and all, it all goes into the model. Small sample sets do produce weaker fits, but this is kept transparent.

Transparency is one of the key strengths of Bayesian stats. It requires you to work a little bit harder on two fronts though. First you have to think about your data generating process (DGP). This means how do the data points you observe came to be. As we said, the process is often unknown. We have at best some guesses of how this could happen. Thankfully, we have a nice way to represent this. DAGs, directed acyclic graphs, are a fancy name for a simple diagram showing what affects what. Most of the time we are trying to discover the DAG, ie the pathway of a biological outcome. Even if you don’t do Bayesian stats, using DAGs to lay out your thoughts is a great. In Bayesian stats the DAGs can be used to test if your model fits the data we observe. If the DAG captures the data generating process the fit is good, and not if it doesn’t.

The other hard bit is doing analysis and communicating the results. Bayesian stats forces you to be verbose about your assumptions in your model. This part is almost magicked away in t-tests. Frequentist stats also makes assumptions about the model that your data is assumed to follow. It all happens so quickly that there isn’t even a second to think about it. You put in your data, click t-test and woosh! You see stars. In Bayesian stats stating the assumptions you make in your model (using DAGs and hypothesis about DGPs) communicates to the world what and why you think this phenomenon occurs.

Discovering causality is the whole reason for doing science. Knowing the causality allows us to intervene in the forms of treatments and drugs. But if my tools don’t allow me to be transparent and worse if they block people from correcting me, why bother?

Richard McElreath says it best:

There is no method for making causal models other than science. There is no method to science other than honest anarchy.

#AI #BayesianStatistics #BiologicalDataAnalysis

**Dr Mircea Zloteanu** @mzloteanu@mastodon.social · 3d

Dr Mircea Zloteanu @mzloteanu@mastodon.social

#statstab #391 {sensemakr} Sensitivity Analysis Tools for OLS

Thoughts: No unobserved variables is an untestable assumption, but you can quantify the robustness of your ATE.

#R #causalinference #observational #inference #confounding #bias #sensitivity

https://carloscinelli.com/sensemakr/

carloscinelli.comSensitivity Analysis Tools for Regression ModelsImplements a suite of sensitivity analysis tools that extends the traditional omitted variable bias framework and makes it easier to understand the impact of omitted variables in regression models, as discussed in Cinelli, C. and Hazlett, C. (2020), "Making Sense of Sensitivity: Extending Omitted Variable Bias." Journal of the Royal Statistical Society, Series B (Statistical Methodology) <doi:10.1111/rssb.12348>.

**Dr Mircea Zloteanu** @mzloteanu@mastodon.social · Jul 15

Jul 15

Dr Mircea Zloteanu @mzloteanu@mastodon.social

#statstab #387 Give Your Hypotheses Space!

Thoughts: "It’s tempting to throw a bunch of variables...into a model
...but proceed at your own caution!"

#Mbias #causalinference #collider #moderator #confounder #regression #r #DAG

https://brian-lookabaugh.github.io/website-brianlookabaugh/blog/2025/mutual-adjustment/

**Aneesh Sathe** @aneeshsathe.com@aneeshsathe.com · Jul 11

Jul 11

Aneesh Sathe @aneeshsathe.com@aneeshsathe.com

Beyond the Dataset

On the recent season of the show Clarkson’s farm, J.C. goes through great lengths to buy the right pub. As with any sensible buyer, the team does a thorough tear down followed by a big build up before the place is open for business. They survey how the place is built, located, and accessed. In their refresh they ensure that each part of the pub is built with purpose. Even the tractor on the ceiling. The art is in answering the question: How was this place put together?

A data-scientist should be equally fussy. Until we trace how every number was collected, corrected and cleaned, —who measured it, what tool warped it, what assumptions skewed it—we can’t trust the next step in our business to flourish.

Old sound (1925) painting in high resolution by Paul Klee. Original from the Kunstmuseum Basel Museum. Digitally enhanced by rawpixel.

Two load-bearing pillars

While there are many flavors of data science I’m concerned about the analysis that is done in scientific spheres and startups. In this world, the structure held up by two pillars:

How we measure — the trip from reality to raw numbers. Feature extraction.
How we compare — the rules that let those numbers answer a question. Statistics and causality.

Both of these related to having a deep understanding of the data generation process. Each from a different angle. A crack in either pillar and whatever sits on top crumbles. Plots, significance, AI predictions, mean nothing.

How we measure

A misaligned microscope is the digital equivalent of crooked lumber. No amount of massage can birth a photon that never hit the sensor. In fluorescence imaging, the point-spread function tells you how a pin-point of light smears across neighboring pixels; noise reminds you that light itself arrives from and is recorded by at least some randomness. Misjudge either and the cell you call “twice as bright” may be a mirage.

In this data generation process the instrument nuances control what you see. Understanding this enables us to make judgements about what kind of post processing is right and which one may destroy or invent data. For simpler analysis the post processing can stop at cleaner raw data. For developing AI models, this process extends to labeling and analyzing data distributions. Andrew Ng’s approach, in data-centric AI, insists that tightening labels, fixing sensor drift, and writing clear provenance notes often beat fancier models.

How we compare

Now suppose Clarkson were to test a new fertilizer, fresh goat pellets, only on sunny plots. Any bumper harvest that follows says more about sunshine than about the pellets. Sound comparisons begin long before data arrive. A deep understanding of the science behind the experiment is critical before conducting any statistics. The wrong randomization, controls, and lurking confounder eat away at the foundation of statistics.

This information is not in the data. Only understanding how the experiment was designed and which events preclude others enable us to build a model of the world of the experiment. Taking this lightly has large risks for startups with limited budgets and smaller experiments. A false positive result leads to wasted resources while a false negative presents opportunity costs.

The stakes climb quickly. Early in the COVID-19 pandemic, some regions bragged of lower death rates. Age, testing access, and hospital load varied wildly, yet headlines crowned local policies as miracle cures. When later studies re-leveled the footing, the miracles vanished.

Why the pillars get skipped

Speed, habit, and misplaced trust. Leo Breiman warned in 2001 that many analysts chase algorithmic accuracy and skip the question of how the data were generated. What he called the “two cultures.” Today’s tooling tempts us even more: auto-charts, one-click models, pretrained everything. They save time—until they cost us the answer.

The other issue is lack of a culture that communicates and shares a common language. Only in academic training is it possible to train a single person to understand the science, the instrumentation, and the statistics sufficiently that their research may be taken seriously. Even then we prefer peer review. There is no such scope in startups. Tasks and expertise must be split. It falls to the data scientist to ensure clarity and collecting information horizontally. It is the job of the leadership to enable this or accept dumb risks.

Opening day

Clarkson’s pub opening was a monumental task with a thousand details tracked and tackled by an army of experts. Follow the journey from phenomenon to file, guard the twin pillars of measure and compare, and reinforce them up with careful curation and open culture. Do that, and your analysis leaves room for the most important thing: inquiry.

#AI #causalInference #cleanData

**Carl Gold, PhD** @carl24k@sigmoid.social · Jul 10

Jul 10

Carl Gold, PhD @carl24k@sigmoid.social

My PR to the #EconML #PyWhy #opensource #causalai project was merged! I made a small contribution by allowing a flexible choice of evaluation metric for scoring both the first stage and final stage models in Double Machine Learning (#DML). Before, only the mean square error (MSE) was implemented. But as an ML practitioner "in the trenches" I have found that MSE is hard to interpret and compare across models. My new functions allow that #CausalInference #machinelearning #datascience

**Dr Mircea Zloteanu** @mzloteanu@mastodon.social · Jul 9

Jul 9

Dr Mircea Zloteanu @mzloteanu@mastodon.social

#statstab #383 Berkson's paradox

Thoughts: aka Berkson's bias, collider bias, or Berkson's fallacy. Important for interpreting conditional probabilities. Can produce counterintuitive patterns.

#paradox #collider #bias #inference #causalinference

https://en.m.wikipedia.org/wiki/Berkson's_paradox

**Tom Stafford** @tomstafford@mastodon.online · Jun 18

Jun 18

Tom Stafford @tomstafford@mastodon.online

So far at this conference I have seen reports of true experiments, natural experiments, difference in difference analysis and regression discontinuity design - but no instrumental variable analysis

I wonder why?

I was hoping for the full set of causal inference methods

#ICSSI2025 #CausalInference

**Dr Mircea Zloteanu** @mzloteanu@mastodon.social · Jun 17

Jun 17

Dr Mircea Zloteanu @mzloteanu@mastodon.social

#statstab #367 Matching in R: Propensity Scores, Weighting (IPTW) and the Double Robust Estimator

Thoughts: A guide on common adjustments for observational studies.

#r #observational #iptw #matching #weights #doublerobust #guide #causalinference

https://www.franciscoyira.com/post/matching-in-r-3-propensity-score-iptw/

francisco yirá's blog · May 1, 2022Matching in R (III): Propensity Scores, Weighting (IPTW) and the Double Robust EstimatorIn the last part of this series about Matching estimators in R, we'll look at Propensity Scores as a way to solve covariate imbalance while handling the curse of dimensionality, and to how implement a Propensity Score estimator using the `twang` package in R. We'll also explore the importance of common support, the inverse probability weighting estimator (IPTW) and the double robust estimator, which combines a regression specification with a matching-based model in order to obtain a good estimate even when there is something wrong with one of the two underlying models.

**MinmiTheDino** @minmi@sfba.social · Jun 15

Jun 15

MinmiTheDino @minmi@sfba.social

What are people’s fave methods for this situation:

At t0, all units are untreated.

As time goes on, individual units are one by one selected for treatment, on an expert’s assessment of their potential improvement under treatment.

How to measure the treatment effect, either over all units or ideally the treatment effect on each unit?

Oh, for extra fun, they’re probably not independent

#Statistics #CausalInference #Econometrics

**Christian Röver** @croever@mastodon.social · Jun 11

Jun 11

Christian Röver @croever@mastodon.social

Registration is open for the GMDS ACADEMY 2025 (Hannover, October 20-23).
There will be three parallel workshops on meta analysis, causal inference and time-to-event analysis involving Wolfgang Viechtbauer (@wviechtb), Christian Röver, Sebastian Weber, Vanessa Didelez, Arthur Allignol, Oliver Kuß, Alexandra Strobel, Hannes Buchner, Xiaofei Liu and Ann-Kathrin Ozga.
See here for more details:
https://www.gmds.de/fileadmin/user_upload/GMDS-Academy-2025.pdf

#MetaAnalysis #CausalInference #SurvivalAnalysis

**मेंथी** @trigonella@social.seattle.wa.us · Jun 4

Jun 4

मेंथी @trigonella@social.seattle.wa.us

Causal inference feels like pretty much the most important topic one can think of in statistics or even for humanity in general. So why is the entire field dominated by just one or two people (obviously I'm referring to Judea Pearl and/or Donald Rubin)? It feels rather... cultish.

Can any folks in the field opine why it is so dominated by one or two individuals, compared to any other important area of research today?

#CausalInference #Statistics

**Dr Mircea Zloteanu** @mzloteanu@mastodon.social · May 21

May 21

Dr Mircea Zloteanu @mzloteanu@mastodon.social

#statstab #348 The Effect {book} - Causal Diagrams

Thoughts: At some point you'll need to learn about DAGs. Maybe this is the chapter you need.

#DAGs #causalinference #guide #book #education #ebook

https://theeffectbook.net/ch-CausalDiagrams.html

**jobRxiv** @jobRxiv@mas.to · Apr 26

Apr 26

jobRxiv @jobRxiv@mas.to

Postdoc in Single-Cell Multi-Omic Gene Regulatory Networks

University of Massachusetts Chan Medical School

Join us to decode #GeneRegulatoryNetwork from #SingleCell multiomics with #CausalInference as a #postdoc! Quantitative bg needed.

See the full job description on jobRxiv: https://jobrxiv.org/job/university-of-massachusetts-chan-medica...
https://jobrxiv.org/job/university-of-massachusetts-chan-medical-school-27778-postdoc-in-single-cell-multi-omic-gene-regulatory-networks/?feed_id=94702

jobRxiv is the job board for scientist, by scientists

jobRxiv · Jun 30, 2019Science Jobs - Find science and research jobsThe international job board for scientists, by scientists. Find science jobs in academia or industry: MSc, PhD, Postdoc, Scientist, Faculty and more!

**MinmiTheDino** @minmi@sfba.social · Apr 25

Apr 25

MinmiTheDino @minmi@sfba.social

Hello SFBA! I’ve been wistfully thinking of switching over here for a while and recent fosstodon choices gave me the push I needed. So #introduction time!

I’m from #SanFrancisco and moved back here after some wandering. Raising two kids and a dog. Working in tech (sigh) but on #sustainability at least.

Interested in and post about #CausalInference, #Statistics, #Politics, #Policy, #Climate, #Energy, #Dogs, #Crafting and #Parenting

**jobRxiv** @jobRxiv@mas.to · Apr 9

Apr 9

jobRxiv @jobRxiv@mas.to

Postdoc in Single-Cell Multi-Omic Gene Regulatory Networks

University of Massachusetts Chan Medical School

Join us to decode #GeneRegulatoryNetwork from #SingleCell multiomics with #CausalInference as a #postdoc! Quantitative bg needed.

**Martin Modrák** @modrak_m@bayes.club · Apr 8 *

Apr 8 *

Martin Modrák @modrak_m@bayes.club

This looks great: Andrew Gelman (@statmodeling_bot ) would be joining Nancy Cartwright and Berna Devezer. Short idea talks, lots of panel discussion and Q&A.

Join us on April 25th to discuss RCTs, replications, and scientific inference.
https://sites.google.com/view/cepbi/talks-gatherings?authuser=0

sites.google.comTALKS & GATHERINGS2025

#stats #causalInference #RCTs

Recent searches

Search options

Administered by:

Server stats:

#CausalInference

Recent searches

Search options

Administered by:

Server stats:

Causalinference

#CausalInference