Statistical Inference and Learning Methods for Research
Preface
This book has evolved from the lecture notes for STAT 845: Statistical Methods for Research, a graduate-level course taught at the University of Saskatchewan. It is born out of a recognition that for many graduate students, statistics is not merely a theoretical subject, but a vital tool required to unlock the meaning behind their research data.
Audience and Prerequisites
This text is designed specifically for graduate students who possess an introductory background in statistics but find themselves facing complex research problems that require more robust analytical tools. Whether you are designing an experiment for an agricultural field trial, analyzing patient data in epidemiology, or optimizing processes in engineering, this book aims to bridge the gap between basic statistical literacy and the practical application of advanced methods.
Key Features
To support the transition from theory to practice, this book incorporates several distinct pedagogical features:
- Data Science Technologies: We leverage the modern
Recosystem, moving beyond basic command-line usage. This includes using Quarto for dynamic reporting,ggplot2for publication-quality visualization, and modern libraries for efficient data tabulation. These tools ensure that students are equipped with the current industry standards for data science. - Statistical Inference and Learning: A core theme of the book is embedding concepts of statistical learning to distinguish between statistical significance (p-values) and practical significance (effect size and predictive power). We aim to train researchers to look beyond simple “pass/fail” metrics and evaluate the meaningful impact of their findings.
- Simulation and Animation for Visualizing Probability: Abstract concepts such as sampling distributions, p-values, and confidence intervals can be difficult to grasp without mathematical training. However, they can be quickly visualized with simulation and animation. We utilize animations to illustrate the probabilistic interpretation of these concepts, allowing readers to “see” the randomness and convergence that underlie statistical inference.
- Real-World Datasets: The messy reality of research cannot be captured by perfect, artificial data. We employ real datasets throughout the text to demonstrate the specific challenges and pitfalls encountered when applying statistical methods in practice, from handling outliers to addressing violations of assumptions.
Structure of the Book
The material is organized to guide the reader from foundational tools to complex experimental designs:
- Chapters 1 and 2 lay the groundwork, introducing the philosophy of statistical research and providing a crash course in
Rfor data analysis. - Chapters 3 and 4 cover the cornerstone of modeling: Linear Regression. We move from Simple Linear Regression to Multiple Linear Regression, allowing students to handle multi-variable relationships.
- Chapter 5 delves into the diagnostics of these models, specifically understanding leverage and adjusting residuals in Ordinary Least Squares (OLS) to ensure model validity.
- Chapter 6 introduces Logistic Regression, equipping researchers to handle categorical and binary outcomes.
- Chapters 7 and 8 shift focus to Experimental Design. We explore Randomized Complete Block Design and Two-Factor Factorial Design, essential methodologies for controlling variability and understanding interaction effects in controlled experiments.