Statistical Methods

Author

Marvin Kweyu

Published

October 8, 2025

Preface

“There is a kind of knowledge that only comes from doing something the hard way, from first principles, with your own hands.”

~ Michael Mendy (Linux from Scratch)

Statistical methods are rarely taught the way they are used. You encounter them in courses as isolated techniques - a t-test here, a regression there - each presented as though the hard part is remembering the formula. Then you sit in front of real data, with a real question, and realise the hard part was never the formula. It was knowing which one to reach for, why and what you are actually assuming when you do.

I started this guide because I needed one that did not exist. As a CS graduate daring to delve into the world of research, I had enough mathematical exposure to follow derivations but not enough statistical intuition to trust my own analysis. I could run the code, sure, but I could not always defend the output. Years of building production systems across agriculture, distributed infrastructure, and applied research had taught me a great deal about shipping software - and very little about the formal reasoning underneath the models I was increasingly relying on. That gap, between execution and understanding, is what this guide is an attempt to close.

This is not a passive document. I will reference books I am reading alongside this work. I will question assumptions I once treated as settled. I will break things deliberately to see where the edges are, and I will revise conclusions that no longer hold when examined more carefully. It will change as my understanding does - and as the field of statistical computing continues to move underneath all of us.

I cover:

  • Probability foundations and distributional thinking

  • Linear and generalised linear models

  • Bayesian inference and prior specification

  • Model diagnostics, selection, and validation

  • Statistical computing in R

  • Spatial and ecological applications

Most examples are drawn from work I have done or am currently doing: ecological niche modelling, geospatial data analysis, and applied research in climate and agriculture systems across Africa. The statistics here are not decorative.

Do not treat it as a reference manual. You will not find exhaustive API documentation or a formula sheet. What you will find is the reasoning behind the methods - why the assumptions matter, what breaks when they are violated, and what the output is actually telling you versus what it is tempting to believe it is telling you.

I dedicate this to the practitioners who want to understand what they are already doing, or want to do it with more honesty. Curiosity is assumed. Everything else, we build from here.