CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

Similar documents
CS 581 Algorithmic Computational Genomics. Tandy Warnow University of Illinois at Urbana-Champaign

Construc)ng the Tree of Life: Divide-and-Conquer! Tandy Warnow University of Illinois at Urbana-Champaign

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

SEPP and TIPP for metagenomic analysis. Tandy Warnow Department of Computer Science University of Texas

CS 394C Algorithms for Computational Biology. Tandy Warnow Spring 2012

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

From Genes to Genomes and Beyond: a Computational Approach to Evolutionary Analysis. Kevin J. Liu, Ph.D. Rice University Dept. of Computer Science

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

CS 581 / BIOE 540: Algorithmic Computa<onal Genomics. Tandy Warnow Departments of Bioengineering and Computer Science hhp://tandy.cs.illinois.

Ultra- large Mul,ple Sequence Alignment

394C, October 2, Topics: Mul9ple Sequence Alignment Es9ma9ng Species Trees from Gene Trees

TheDisk-Covering MethodforTree Reconstruction

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Computa(onal Challenges in Construc(ng the Tree of Life

Phylogenetic Geometry

ASTRAL: Fast coalescent-based computation of the species tree topology, branch lengths, and local branch support

Phylogenetic Tree Reconstruction

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

CS 394C March 21, Tandy Warnow Department of Computer Sciences University of Texas at Austin

EVOLUTIONARY DISTANCES

BIOINFORMATICS. Scaling Up Accurate Phylogenetic Reconstruction from Gene-Order Data. Jijun Tang 1 and Bernard M.E. Moret 1

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

Recent Advances in Phylogeny Reconstruction

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Phylogenetic inference

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Dr. Amira A. AL-Hosary

Estimating Evolutionary Trees. Phylogenetic Methods

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

A Phylogenetic Network Construction due to Constrained Recombination

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

An Investigation of Phylogenetic Likelihood Methods

From Gene Trees to Species Trees. Tandy Warnow The University of Texas at Aus<n

Disk-Covering, a Fast-Converging Method for Phylogenetic Tree Reconstruction ABSTRACT

Constrained Exact Op1miza1on in Phylogene1cs. Tandy Warnow The University of Illinois at Urbana-Champaign

Phylogenetic Networks, Trees, and Clusters

Phylogenetics. BIOL 7711 Computational Bioscience

Quantifying sequence similarity

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Anatomy of a species tree

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Effects of Gap Open and Gap Extension Penalties

Upcoming challenges in phylogenomics. Siavash Mirarab University of California, San Diego

Constructing Evolutionary/Phylogenetic Trees

Woods Hole brief primer on Multiple Sequence Alignment presented by Mark Holder

Fast coalescent-based branch support using local quartet frequencies

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Algorithms in Bioinformatics

BINF6201/8201. Molecular phylogenetic methods

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

A (short) introduction to phylogenetics

Phylogeny: building the tree of life

Reconstruction of species trees from gene trees using ASTRAL. Siavash Mirarab University of California, San Diego (ECE)

Phylogenetics: Building Phylogenetic Trees

Multiple Sequence Alignment. Sequences

Jijun Tang and Bernard M.E. Moret. Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Constructing Evolutionary/Phylogenetic Trees

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University


hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

How to read and make phylogenetic trees Zuzana Starostová

Phylogeny Tree Algorithms

Isolating - A New Resampling Method for Gene Order Data

Mathematics of Evolution and Phylogeny. Edited by Olivier Gascuel

X X (2) X Pr(X = x θ) (3)

CONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109

CS481: Bioinformatics Algorithms

Phylogenetics without multiple sequence alignment

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Using algebraic geometry for phylogenetic reconstruction

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Reconstruire le passé biologique modèles, méthodes, performances, limites

SUPPLEMENTARY INFORMATION

Phylogenetic analyses. Kirsi Kostamo

7. Tests for selection

Molecular Evolution & Phylogenetics

A Minimum Spanning Tree Framework for Inferring Phylogenies

Predicting Protein Functions and Domain Interactions from Protein Interactions

What is Phylogenetics

Tools and Algorithms in Bioinformatics

Regular networks are determined by their trees

Organisatorische Details

Phylogenetics in the Age of Genomics: Prospects and Challenges

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

CS 6375 Machine Learning

A phylogenomic toolbox for assembling the tree of life

Model Accuracy Measures

O 3 O 4 O 5. q 3. q 4. Transition

Hidden Markov models in population genetics and evolutionary biology

Transcription:

CS 581 Algorithmic Computational Genomics Tandy Warnow University of Illinois at Urbana-Champaign

Today Explain the course Introduce some of the research in this area Describe some open problems Talk about course projects!

CS 581 Algorithm design and analysis (largely in the context of statistical models) for Multiple sequence alignment Phylogeny Estimation Genome assembly I assume mathematical maturity and familiarity with graphs, discrete algorithms, and basic probability theory, equivalent to CS 374 and CS 361. No biology background is required!

Textbook Computational Phylogenetics: An introduction to Designing Methods for Phylogeny Estimation This may be available in the University Bookstore. If not, you can get it from Amazon. The textbook provides all the background you ll need for the class, and homework assignments will be taken from the book.

Grading Homeworks: 25% (worst grade dropped) Take home midterm: 40% Final project: 25% Course participation and paper presentation: 10%

Course Project Course project is due the last day of class, and can be a research project or a survey paper. If you do a research project, you can do it with someone else (including someone in my research group); the goal will be to produce a publishable paper. If you do a survey paper, you will do it by yourself.

Phylogeny (evolutionary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the Life Website, University of Arizona

Phylogeny + genomics = genome-scale phylogeny estimation.

Estimating the Tree of Life Basic Biology: How did life evolve? Applications of phylogenies to: protein structure and function population genetics human migrations metagenomics Figure from https://en.wikipedia.org/wiki/common_descent

Estimating the Tree of Life Large datasets! Millions of species thousands of genes NP-hard optimization problems Exact solutions infeasible Approximation algorithms Heuristics Multiple optima Figure from https://en.wikipedia.org/wiki/common_descent High Performance Computing: necessary but not sufficient

Muir, 2016

Computer Science Solving Problems in Biology and Linguistics Algorithm design using Divide-and-conquer Iteration Heuristic search Graph theory Algorithm analysis using Probability Theory Graph Theory Simulations and modelling Collaborations with biologists and linguists and data analysis Discoveries about how life evolved on earth (and how languages evolved, too)

Computational Phylogenetics (2005) Current methods can use months to estimate trees on 1000 DNA sequences Our objective: More accurate trees and alignments on 500,000 sequences in under a week Courtesy of the Tree of Life web project, tolweb.org

Computational Phylogenetics (2018) 1997-2001: Distance-based phylogenetic tree estimation from polynomial length sequences 2012: Computing accurate trees (almost) without multiple sequence alignments 2009-2015: Co-estimation of multiple sequence alignments and gene trees, now on 1,000,000 sequences in under two weeks 2014-2015: Species tree estimation from whole genomes in the presence of massive gene tree heterogeneity Courtesy of the Tree of Life web project, tolweb.org 2016-2017: Scaling methods to very large heterogeneous datasets using novel machine learning and supertree methods.

The Tree of Life: Multiple Challenges Scientific challenges: Ultra-large multiple-sequence alignment Gene tree estimation Metagenomic classification Alignment-free phylogeny estimation Supertree estimation Estimating species trees from many gene trees Genome rearrangement phylogeny Reticulate evolution Visualization of large trees and alignments Data mining techniques to explore multiple optima Theoretical guarantees under Markov models of evolution Techniques: applied probability theory, graph theory, supercomputing, and heuristics Testing: simulations and real data

This talk Divide-and-conquer: Basic algorithm design technique Application of divide-and-conquer to: Phylogeny estimation (and proof of absolute fast convergence ) Multiple sequence alignment Almost alignment-free tree estimation Open problems

Divide-and-Conquer Divide-and-conquer is a basic algorithmic trick for solving problems! Three steps: divide a dataset into two or more sets, solve the problem on each set, and combine solutions.

Sorting 10 3 54 23 75 5 1 25 Objective: sort this list of integers from smallest to largest. 10, 3, 54, 23, 75, 5, 1, 25 should become 1, 3, 5, 10, 23, 25, 54, 75

MergeSort 10 3 54 23 75 5 1 25 Step 1: Divide into two sublists Step 2: Recursively sort each sublist Step 3: Merge the two sorted sublists

Key technique: Divide-and-conquer! Can be used to produce provably correct algorithms (as in MergeSort) But also in practice, divide-and-conquer is useful for scaling methods to large datasets, because small datasets with not too much heterogeneity are easy to analyze with good accuracy.

Three divide-and-conquer algorithms DCM1 (2001): Improving distance-based phylogeny estimation SATé (2009, 2012) and PASTA (2015): Coestimation of multiple sequence alignments and gene trees (up to 1,000,000 sequences) DACTAL (2012): Almost alignment-free tree estimation

Three divide-and-conquer algorithms DCM1 (2001): Improving distance-based phylogeny estimation SATé (2009, 2012) and PASTA (2015): Coestimation of multiple sequence alignments and gene trees (up to 1,000,000 sequences) DACTAL (2012): Almost alignment-free tree estimation

DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2 mil yrs AGGGCAT TAGCCCT AGCACTT -1 mil yrs AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT today

Gene Tree Estimation U V W X Y AGGGCAT TAGCCCA TAGACTT TGCACAA TGCGCTT U X Y V W

Quantifying Error FN FN: false negative (missing edge) FP: false positive (incorrect edge) 50% error rate FP

Markov Model of Site Evolution Simplest (Jukes-Cantor, 1969): The model tree T is binary and has substitution probabilities p(e) on each edge e. The state at the root is randomly drawn from {A,C,T,G} (nucleotides) If a site (position) changes on an edge, it changes with equal probability to each of the remaining states. The evolutionary process is Markovian. The different sites are assumed to evolve independently and identically down the tree (with rates that are drawn from a gamma distribution). More complex models (such as the General Markov model) are also considered, often with little change to the theory.

Sequence length requirements The sequence length (number of sites) that a phylogeny reconstruction method M needs to reconstruct the true tree with probability at least 1-ε depends on M (the method) ε f = min p(e), g = max p(e), and n, the number of leaves We fix everything but n.

Statistical consistency, exponential convergence, and absolute fast convergence (afc)

Distance-based estimation

Neighbor joining on large diameter trees Error Rate 0.8 0.6 0.4 NJ Simulation study based upon fixed edge lengths, K2P model of evolution, sequence lengths fixed to 1000 nucleotides. Error rates reflect proportion of incorrect edges in inferred trees. 0.2 [Nakhleh et al. Bioinf 17, Suppl 1 (2001) S190-S198] 0 0 400 800 1200 1600 No. Taxa

DCMs: Divide-and-conquer for improving phylogeny reconstruction

DCM1 Decompositions Input: Set S of sequences, distance matrix d, threshold value q { dij} 1. Compute threshold graph Gq = ( V, E), V = S, E = {( i, j) : d( i, j) q} 2. Perform minimum weight triangulation (note: if d is an additive matrix, then the threshold graph is provably triangulated). DCM1 decomposition : Compute maximal cliques

DCM1-boosting: Warnow, St. John, and Moret, SODA 2001 Exponentially converging (base) method DCM1 SQS Absolute fast converging (DCM1-boosted) method The DCM1 phase produces a collection of trees (one for each threshold), and the SQS phase picks the best tree. For a given threshold, the base method is used to construct trees on small subsets (defined by the threshold) of the taxa. These small trees are then combined into a tree on the full set of taxa.

DCM1-boosting Error Rate 0.8 0.6 0.4 0.2 NJ DCM1-NJ Theorem (Warnow, St. John, and Moret, SODA 2001): DCM1-NJ converges to the true tree from polynomial length sequences 0 0 400 800 1200 1600 No. Taxa [Nakhleh et al. Bioinf 17, Suppl 1 (2001) S190-S198]

Three divide-and-conquer algorithms DCM1 (2001): Improving distance-based phylogeny estimation SATé (2009, 2012) and PASTA (2015): Coestimation of multiple sequence alignments and gene trees (up to 1,000,000 sequences) DACTAL (2012): Almost alignment-free tree estimation

DNA Sequence Evolution AAGGCCT AAGACTT TGGACTT -3 mil yrs -2 mil yrs AGGGCAT TAGCCCT AGCACTT -1 mil yrs AGGGCAT TAGCCCA TAGACTT AGCACAA AGCGCTT today

Indels (insertions and deletions) Deletion Mutation ACGGTGCAGTTACCA ACCAGTCACCA

Deletion Substitution ACGGTGCAGTTACCA Insertion ACCAGTCACCTA ACGGTGCAGTTACC-A AC----CAGTCACCTA The true multiple alignment Reflects historical substitution, insertion, and deletion events Defined using transitive closure of pairwise alignments computed on edges of the true tree

Gene Tree Estimation S1 = AGGCTATCACCTGACCTCCA S2 = TAGCTATCACGACCGC S3 = TAGCTGACCGC S4 = TCACGACCGACA

Input: unaligned sequences S1 = AGGCTATCACCTGACCTCCA S2 = TAGCTATCACGACCGC S3 = TAGCTGACCGC S4 = TCACGACCGACA

Phase 1: Alignment S1 = AGGCTATCACCTGACCTCCA S2 = TAGCTATCACGACCGC S3 = TAGCTGACCGC S4 = TCACGACCGACA S1 = -AGGCTATCACCTGACCTCCA S2 = TAG-CTATCAC--GACCGC-- S3 = TAG-CT-------GACCGC-- S4 = -------TCAC--GACCGACA

Phase 2: Construct tree S1 = AGGCTATCACCTGACCTCCA S2 = TAGCTATCACGACCGC S3 = TAGCTGACCGC S4 = TCACGACCGACA S1 = -AGGCTATCACCTGACCTCCA S2 = TAG-CTATCAC--GACCGC-- S3 = TAG-CT-------GACCGC-- S4 = -------TCAC--GACCGACA S1 S2 S4 S3

Two-phase estimation Alignment methods Clustal POY (and POY*) Probcons (and Probtree) Probalign MAFFT Muscle Di-align T-Coffee Prank (PNAS 2005, Science 2008) Opal (ISMB and Bioinf. 2007) FSA (PLoS Comp. Bio. 2009) Infernal (Bioinf. 2009) Etc. Phylogeny methods Bayesian MCMC Maximum parsimony Maximum likelihood Neighbor joining FastME UPGMA Quartet puzzling Etc.

Two-phase estimation Alignment methods Clustal POY (and POY*) Probcons (and Probtree) Probalign MAFFT Muscle Di-align T-Coffee Prank (PNAS 2005, Science 2008) Opal (ISMB and Bioinf. 2007) FSA (PLoS Comp. Bio. 2009) Infernal (Bioinf. 2009) Etc. Phylogeny methods Bayesian MCMC Maximum parsimony Maximum likelihood Neighbor joining FastME UPGMA Quartet puzzling Etc.

Two-phase estimation Alignment methods Clustal POY (and POY*) Probcons (and Probtree) Probalign MAFFT Muscle Di-align T-Coffee Prank (PNAS 2005, Science 2008) Opal (ISMB and Bioinf. 2007) FSA (PLoS Comp. Bio. 2009) Infernal (Bioinf. 2009) Etc. Phylogeny methods Bayesian MCMC Maximum parsimony Maximum likelihood Neighbor joining FastME UPGMA Quartet puzzling Etc. RAxML: heuristic for large-scale ML optimization

1000-taxon models, ordered by difficulty (Liu et al., Science 19 June 2009)

Large-scale Co-estimation of alignments and trees SATé: Liu et al., Science 2009 (up to 10,000 sequences) and Systematic Biology 2012 (up to 50,000 sequences) PASTA: Mirarab et al., J. Computational Biology 2015 (up to 1,000,000 sequences)

Re-aligning on a tree A B C D Decompose dataset A C B D Align subproblems A C B D Estimate ML tree on merged alignment ABCD Merge subalignments

SATé and PASTA Algorithms Obtain initial alignment and estimated ML tree Tree

SATé and PASTA Algorithms Obtain initial alignment and estimated ML tree Tree Use tree to compute new alignment Alignment

SATé and PASTA Algorithms Obtain initial alignment and estimated ML tree Estimate ML tree on new alignment Tree Use tree to compute new alignment Alignment

SATé and PASTA Algorithms Obtain initial alignment and estimated ML tree Estimate ML tree on new alignment Tree Use tree to compute new alignment Alignment Repeat until termination condition, and return the alignment/tree pair with the best ML score

1000-taxon models, ordered by difficulty (Liu et al., Science 19 June 2009)

1000-taxon models, ordered by difficulty (Liu et al., Science 19 June 2009) 24 hour SATé analysis, on desktop machines (Similar improvements for biological datasets)

PASTA RNASim 0.20 Tree Error (FN Rate) 0.15 0.10 0.05 Clustal Omega Muscle Mafft Starting Tree SATe2 PASTA Reference Alignment 0.00 10000 50000 100000 200000 Simulated RNASim datasets from 10K to 200K taxa Limited to 24 hours using 12 CPUs Not all methods could run (missing bars could not finish) PASTA: J Comp Biol 22(5):377-386, Mirarab et al., 2015

Multiple Sequence Alignment (MSA): a scientific grand challenge 1 S1 = AGGCTATCACCTGACCTCCA S2 = TAGCTATCACGACCGC S3 = TAGCTGACCGC Sn = TCACGACCGACA S1 = -AGGCTATCACCTGACCTCCA S2 = TAG-CTATCAC--GACCGC-- S3 = TAG-CT-------GACCGC-- Sn = -------TCAC--GACCGACA Novel techniques needed for scalability and accuracy NP-hard problems and large datasets Current methods do not provide good accuracy Few methods can analyze even moderately large datasets Many important applications besides phylogenetic estimation 1 Frontiers in Massive Data Analysis, National Academies Press, 2013

Three divide-and-conquer algorithms DCM1 (2001): Improving distance-based phylogeny estimation SATé (2009, 2012) and PASTA (2015): Coestimation of multiple sequence alignments and gene trees (up to 1,000,000 sequences) DACTAL (2012): Almost alignment-free tree estimation

DACTAL Divide-And-Conquer Trees (Almost) without alignments Nelesen et al., ISMB 2012 and Bioinformatics 2012 Input: unaligned sequences Output: Tree (but no alignment)

Unaligned Sequences BLASTbased Decompose into small local subsets in tree DACTAL Overlapping subsets RAxML(MAFFT) Superfine A tree for each subset A tree for the entire dataset

Results on three biological datasets 6000 to 28,000 sequences. We show results with 5 DACTAL iterations

DACTAL Arbitrary input Any tree estimation method! Any decomposition strategy Overlapping subsets A tree for the entire dataset Any supertree method Start with any tree estimated on the data, decompose to any desired size A tree for each subset

Boosters, or Meta-Methods Meta-methods use divide-and-conquer and iteration (or other techniques) to boost the performance of base methods (phylogeny reconstruction, alignment estimation, etc) Base method M Meta-method M*

Summary We showed three techniques to improve accuracy and scalability, that used divide-and-conquer (and sometimes iteration): DCM1 (improves accuracy and scalability for distancebased methods) SATe and PASTA (improves accuracy and scalability for multiple sequence alignment methods) DACTAL (general technique for improving tree estimation) Innovative algorithm design can improve accuracy as well as reduce running time.

Open problems (and possible course projects) Multiple Sequence Alignment: Merging two alignments Consensus alignments Supertree estimation Need to scale to 100,000+ species with high accuracy Approximation algorithms Species tree and phylogenetic network estimation Address gene tree heterogeneity due to multiple causes Theoretical guarantees under stochastic models Scalability to large numbers of species with whole genomes Statistical Models of evolution All models are seriously unrealistic We need better statistical models New theory Applications to historical linguistics, microbiome analysis, protein structure and function prediction, tumor evolution, etc.

Some prior course projects that were published ASTRAL: genome-scale coalescent-based species tree estimation, Mirarab et al. 2014 (ECCB 2014 and Bioinformatics 2014) Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer, Davidson et al. 2014 (RECOMB-CG 2014 and BMC Genomics 2014) A comparative study of SVDquartets and other coalescent-based species tree estimation methods, Chou et al. (RECOMB-CG and BMC Genomics 2014) ASTRID: Accurate species trees from internode distances, Vachaspati and Warnow (RECOMB-CG 2015 and BMC Genomics 2015) Scaling statistical multiple sequence alignment to large datasets, Nute and Warnow (RECOMB-CG 2016 and BMC Genomics 2016)

The Tree of Life: Multiple Challenges Scientific challenges: Ultra-large multiple-sequence alignment Gene tree estimation Metagenomic classification Alignment-free phylogeny estimation Supertree estimation Estimating species trees from many gene trees Genome rearrangement phylogeny Reticulate evolution Visualization of large trees and alignments Data mining techniques to explore multiple optima Theoretical guarantees under Markov models of evolution Techniques: applied probability theory, graph theory, supercomputing, and heuristics Testing: simulations and real data

Other Information Tandy s office hours: Tuesdays 3:15-4:15 (and by appointment). Tandy s email: warnow@illinois.edu TA: Sarah Christensen (sac2@illinois.edu) Homework will be submitted through Moodle. Course webpage: http://tandy.cs.illinois.edu/581-2018.html