2 Introduction

There is a difference between doing statistics and understanding what you are doing. Most working data scientists live in that gap - running models, interpreting outputs, shipping results - without a clear account of the machinery underneath. This guide is an attempt to close that distance, not by simplifying the mathematics, but by grounding it in the context that makes it legible.

These pages cover statistical methods and statistical computing as they are actually used: imperfectly, iteratively, and usually under some form of deadline pressure.

2.1 How to use this guide

The guide assumes basic fluency in R, which remains the dominant language for statistical computing in research and academic contexts. If that assumption doesn’t hold, R for Data Science is the right starting point - come back when variables and data frames feel natural.

Otherwise, read sequentially. The structure is deliberate: later sections assume earlier ones.

The two exceptions are:

1. You already know the preceding material and are targeting something specific.

2. A section explicitly signals it can stand alone - as in the case with Simple Linear Regression

The goal is not to replace a textbook. It’s to be the thing you read alongside one - the part that explains why the textbook is saying what it’s saying. Let’s get started.