CS109: Probability for Computer Scientists. Piech, CS106A, Stanford University

Size: px

Start display at page:

Download "CS109: Probability for Computer Scientists. Piech, CS106A, Stanford University"

Valerie Hodge
5 years ago
Views:

1 CS109: Probability for Computer Scientists

Chris Piech My parents are interesting folks I originally concentrated in graphics and worked at Pixar Childhood: Nairobi, Kenya High School: Kuala Lumpur,

2 Chris Piech My parents are interesting folks I originally concentrated in graphics and worked at Pixar Childhood: Nairobi, Kenya High School: Kuala Lumpur, Malaysia Stanford University Ph.D. in Deep Learning Research lab on AI for Social Good The problem I really want to solve is to make high quality more education accessible

3 I Took the First CS109 Class Piech, Back CS106A, when I looked Stanford like University this J

4 Teaching Team

5 Course mechanics (this is a light version. Please read the handout for details).

6 Essential Information cs109.stanford.edu

7 Are you in the right place?

8 Prereqs What you really need: CS106B/X (important): Recursion Hash Tables Binary Trees Programming CS103 (ok as a corequisite): Proof techniques (induction) Set theory Math maturity Math 51 or CME 100 (important) Multivariate differentiation Multivariate integration Basic facility with linear algebra (vectors)

9 Coding in CS109 Review session on Friday

10 Staff Contact Post to Piazza for clarification Go to Working Office Hours Chris or go to his office for course level issues. 10

11 CS109 Units Start Here Hours per week = Units 3 Average about 10 hours / week for assignments Are you an Undergrad? No Do you want to take CS109 for fewer units? Yes 3 Units -or- 4 Units Yes No 5 Units

12 Not Videotaped * And you should expect to learn more

13 Class Breakdown 45% 6 Assignments 20% 30% 5% Midterm Tuesday Oct 30 th, 7-9pm Final Wed Dec 12 th, 3:30-6:30pm Section Participation

14 Late Days 2

15 The Student Honor Code

17 Story of Modern AI

18 Four Prototypical Trajectories Modern AI or, How we learned to combine probability and programming

19 Brief History

20 Narrow Intelligence Play Chess Translate Turkish Drive a Car Play Breakout

21 General Intelligence Play Chess Translate Turkish Drive a Car Play Breakout

22 Early Optimism

23 Early Optimism 1950 Machines will be capable, within twenty years, of doing any work a man can do. Herbert Simon, 1952

24 Underwhelming Results 1950s to 1980s The world is too complex

26 Something is going on in the world of AI

27 Big Milestones Pt Deep Blue 2005 Stanley 2011 Watson

28 Told Speech Was 30 Years Out Almost perfect

29 The Last Remaining Board Game

30 Computers Making Art

31 Self Driving Cars

32 What is going on?

33 [suspense]

34 Focus on one problem

35 Computer Vision

36 Logistic Regression is like the Harry Pottery Sorting Hat Classification That is a picture of a one

37 Logistic Regression is like the Harry Pottery Sorting Hat Classification That is a picture of a zero

38 Classification That is a picture of an zero * It doesn t have to be correct all of the time

39 Can you do it?

40 What number is this?

41 What number is this?

42 How about now? What a computer sees What a human sees

43 Very hard to Program?? public class HarryHat extends ConsoleProgram { public void run() { println( Todo: Write program ); } }

44 Two Great Ideas 1. Probability from Examples 2. Artificial Neurons

45 Two Great Ideas 1. Probability from Examples 2. Artificial Neurons

46 1. Probability From Examples

47 When Does the Magic Happen? Lots of Data + Sound Probability

48 Machine Learning Basically just a rebranding of statistics and probability.

49 Vision is Hard Why is this hard? You see this: But the camera sees this: [ [Andrew Ng]

50 Human Designed Features Human Features Find edges Sum up edge at four strength in orientations each quadrant Final feature vector [Andrew Ng]

51 Some Great Thinkers Daphne Koller

52 Straight ML Not Perfect Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle Motorcycle

53 Two Great Ideas 1. Probability from Examples 2. Artificial Neurons

54 2. Artificial Neurons

55 Neuron

56 Neuron

57 Neuron

58 Neuron

59 Some Inputs are More Important

60 Artificial Neuron

61 Sigmoid Function e x An artificial neuron is like a little probability calculator

62 Neural Network Each node represents a neuron (or a vector of neurons) Each edge represents the weight of the interaction Pixels

63 Forward Pass

64 Forward Pass Each node represents a neuron (or a vector of neurons) Each edge represents the weight of the interaction

65 Forward Pass Each node represents a neuron (or a vector of neurons) Each edge represents the weight of the interaction

66 Forward Pass Each node represents a neuron (or a vector of neurons) Each edge represents the weight of the interaction

67 Forward Pass

68 Forward Pass Interpret the last neuron as the probability that the image is of a 1

69 Backward Pass The image had a 0 but we predicted a high probability that it was a 1

70 Backward Pass We start by making our missprediction a numerical loss The image had a 0 but we predicted a high probability that it was a 1

71 Backward Pass We start by making our missprediction a numerical loss The image had a 0 but we predicted a high probability that it was a 1 Update each connection

72 Chose weights that maximize the probability of the right answers P (Y =1 X = x) =ŷ ŷ = 0 Xm j=0 h j (ŷ) j 1 A For one datum P (Y = y X = X) =(ŷ) y (1 ŷ) 1 y For IID data L( ) = = ny P (Y = y (i) X = x (i) ) i=1 ny (ŷ (i) ) y(i) i=1 h i (1 y (i) 1 (ŷ (i) ) )

73 Gradient Ascent Walk uphill and you will find a local maxima (if your step size is small enough)

74 Gradient of output (ŷ) i = 0 Xm j=0 h j (ŷ) i ŷ = 1 2 A 0 Xm j=0 0 Xm j=0 (ŷ) i h j (ŷ) j 1 A h j (ŷ) j (ŷ) i Xm h j=0 h j (ŷ) j =ŷ[1 ŷ] =ŷ[1 ŷ] (ŷ) i Xm h j=0 h j (ŷ) j That looks scarier than it is

75 Chain Rule Down the Network

76 Where you will be by the end of class

77 When you train, something really neat happens

78 Visualize the Weights object models object parts (combination of edges) Training set: Aligned images of faces. edges pixels [Honglak Lee]

79 Google Brain

80 Google Brain 1 Trillion Artificial Neurons

81 A Neuron That Fires When It Sees Cats Top stimuli from the test set Optimal stimulus by numerical optimization Le, et al., Building high-level features Piech, using CS106A, large-scale Stanford unsupervised University learning. ICML 2012

83 Other Neurons Neuron 1 Neuron 2 Neuron 3 Neuron 4 Neuron 5 Le, et al., Building high-level features Piech, using CS106A, large-scale Stanford unsupervised University learning. ICML 2012

84 Autonomous Tutor

85 Prediction Results Benchmark AUC Khan AUC Marginal BKT BKT* DKT 0.6 Huge improvement in ability to predict for real students Marginal BKT DKT Piech et al, 2015

86 Not once, but twice, AI was revolutionized by people who understood probability theory.

87 End of Story

88 Except it isn t the end of the story

89 Probability is more than just machine learning

90 Abundance of Important Problems

91 Algorithms and Probability Eg Raytracing Eg HashMaps Hash Fn

92 Medicine and Probability

93 Autocomplete

94 Probability in Practice

95 Philosophy and Probability

96 Art and Probability

97 Probabilistic Analysis of Algorithms

98 #1 Most Desired Skill in Industry Microsoft's competitive advantage, [Bill Gates] responded, was its expertise in "Bayesian [probabilistic] networks. (from Los Angeles Times, Oct. 28, 1996) The sexy job in the next 10 years will be statisticians. -Hal Varian, Chief Economist at Google (from New York Times, August 6, 2009)

99 #1 Most Desired Skill in Industry I believe over the next decade computing will become even more ubiquitous and intelligence will become ambient. The coevolution of software and new hardware form factors will intermediate and digitize many of the things we do and experience in business, life and our world. This will be made possible by an ever-growing network of connected devices, incredible computing capacity from the cloud, insights from big data, and intelligence from machine learning. -- Satya Nadella (CEO, Microsoft) to all employees on first day as CEO (Feb. 04, 2014)

100 #1 Most Desired Skill in Academia Most CS PhD students list their highest desiderata upon graduation as: Better understanding of probability

101 Foundation for your future

102 But its not always intuitive

103 Zika Test Positive Zika. What is the probability of zika? 0.08% of people have zika 90% positive rate for people with zika 7% positive rate for people without zika The right answer is 1%

104 Probability = Important + Needs Study Delayed gratification

105 What is CS109?

106 Traditional View of Probability

107 CS View of Probability Give you the tools necessary to build and understand probabilistic CS algorithms.

108 CS View of Probability Heart Ancestry Netflix

109 CS View of Probability

110 CS View of Probability Teach you how to write programs that most people are not able to write.

111 Lets dive in

112 Counting

113 Our Route Counting Probabilistic modelling choices Core Probability Machine Learning

CS106A: Programming Methodologies. Piech, CS106A, Stanford University

CS106A: Programming Methodologies Chris Piech My parents are interesting folks I originally concentrated in graphics and worked at Pixar Childhood: Nairobi, Kenya High School: Kuala Lumpur, Malaysia Stanford