Bio 5075 Fundamentals of Biostatistics for Graduate Students

The ability to quantitatively evaluate one’s data is increasingly important in scientific
research. Yet many entering PhD students lack a fundamental understanding of the
statistical principles and basic programming skills that can accelerate and empower
data analysis. This one-credit course is a primer on fundamental statistical and
computational skills and concepts for first-year DBBS students; it assumes no prior
experience in statistics or programming. The course will cover common statistical
practices and concepts in the life sciences, such as error bars, summary statistics,
probability and distributions, and hypothesis testing. In parallel, the class will also teach
students programming skills for basic statistical computation.

The course format emphasizes practical problem-solving skills by teaching both core
statistical concepts and computational methods to implement them. The course will
introduce students to the Python programming language and key Python statistical and
plotting tools. Upon completing the course, students will be able to retrieve and analyze
simple and genomic-style datasets from online databases, write simple data analysis
scripts in Python, create the major types of statistical plots, and critically evaluate how
best to assess the significance of and summarize their data.

:: Help sessions are held Mondays, from 5:30 to 6:30pm in McKinley 6001B ::

For any questions, email the admins (instructors, TAs, and tutors) at:

Lecture 0 (Computation): Introduction to IPython Notebook
–       Installing IPython Notebook
–       Basic Python Usage

Lecture 1 (Statistics): Summarizing Numbers
–       Single number summaries: mean, median and mode
–       Two numbers: variance and standard deviation
–       Dot plots and histograms
–       Distributions

Lecture 2 (Computation): Basic Python with Genomic Data
–       Obtaining genomic data from online databases
–       Methods to import data into IPython Notebook
–       Basic Python Syntax
–       Python data types and structures

2016-10-14: PS1 assignment due (Thursday)

Lecture 3 (Statistics): Basic Probability
–       Intuitive probability estimation from histograms
–       Basic theory and notation
–       How probabilities combine: “and” and “or”
–       Independence and conditional probability
–       Counting successes and failures

2016-10-20: PS2 assignment due (Thursday)

Lecture 4 (Computation): Python for Data Analysis
–       Python Syntax
–       Python data types and structures
–       Using Python to examine datasets

2016-10-27: PS3 assignment due (Thursday)

Lecture 5 (Computation): Using Python Libraries: Plotting
–       Python libraries and functions
–       Plot types and Python plotting functions
–       Writing Python functions

2016-11-03 PS4 assignment due (Thursday)

Lecture 6 (Statistics): Simulation and Hypothesis Testing (I)
–       Why simulate?
–       Hypothesis testing and the null distribution
–       What p-values are and are not
–       Recent controversies in the use of p-values

2016-11-10 PS5 assignment due (Thursday)

Lecture 7 (Computation): Simulation
–       Advanced Python functions
–       Random sampling and simulation

2016-11-19: PS6 assignment due (Saturday, 11:59 PM)

Lecture 8 (Statistics): Simulation and Hypothesis Testing (II)
–       Permutation testing
–       Sampling from a population
–       Bootstrap confidence intervals
–       Bootstrap hypothesis testing

Lecture 9 (Computation): Statistics in Python
–       Bootstrap testing
–       Python statistics libraries

2016-12-11: PS8 assignment due (Sunday, 11:59 pm)

Lecture 10 (Statistics): Power Analysis, Experimental Design, and Parametric Statistics
–       Statistical Power
–       Paired tests
–       The standard error and the t-test
–       ANOVA