Optimization and Complexity

Similar documents
Unit 1A: Computational Complexity

The Complexity of Optimization Problems

Introduction to Complexity Theory

Computational Complexity and Intractability: An Introduction to the Theory of NP. Chapter 9

2 P vs. NP and Diagonalization

NP-Completeness and Boolean Satisfiability

Easy Problems vs. Hard Problems. CSE 421 Introduction to Algorithms Winter Is P a good definition of efficient? The class P

Analysis of Algorithms. Unit 5 - Intractable Problems

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Algorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University

A An Overview of Complexity Theory for the Algorithm Designer

Review of unsolvability

More on NP and Reductions

Computational Complexity

NP-Completeness. f(n) \ n n sec sec sec. n sec 24.3 sec 5.2 mins. 2 n sec 17.9 mins 35.

1.1 P, NP, and NP-complete

In complexity theory, algorithms and problems are classified by the growth order of computation time as a function of instance size.

Lecture 8: Alternatation. 1 Alternate Characterizations of the Polynomial Hierarchy

Spring Lecture 21 NP-Complete Problems

CS 5114: Theory of Algorithms

Non-Deterministic Time

Learning Decision Trees

CS6901: review of Theory of Computation and Algorithms

CSE 105 THEORY OF COMPUTATION

CS3719 Theory of Computation and Algorithms

Lecture 6: Oracle TMs, Diagonalization Limits, Space Complexity

Fundamentals of optimization problems

CS154, Lecture 17: conp, Oracles again, Space Complexity

Intractable Problems. Time-Bounded Turing Machines Classes P and NP Polynomial-Time Reductions

CSE 105 THEORY OF COMPUTATION

CP405 Theory of Computation

Complexity Theory. Knowledge Representation and Reasoning. November 2, 2005

Intelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University

conp, Oracles, Space Complexity

CS151 Complexity Theory. Lecture 1 April 3, 2017

Computational complexity

Notes on Complexity Theory Last updated: October, Lecture 6

Intractable Problems [HMU06,Chp.10a]

NP Completeness and Approximation Algorithms

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.

ECE 695 Numerical Simulations Lecture 2: Computability and NPhardness. Prof. Peter Bermel January 11, 2017

CS 350 Algorithms and Complexity

Turing Machines and Time Complexity

NP Complete Problems. COMP 215 Lecture 20

CS 350 Algorithms and Complexity

Definition: conp = { L L NP } What does a conp computation look like?

Chapter 3 Deterministic planning

an efficient procedure for the decision problem. We illustrate this phenomenon for the Satisfiability problem.

Introduction to Computational Complexity

V Honors Theory of Computation

UNIT-VIII COMPUTABILITY THEORY

A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997

Database Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler

Q = Set of states, IE661: Scheduling Theory (Fall 2003) Primer to Complexity Theory Satyaki Ghosh Dastidar

The Complexity of Maximum. Matroid-Greedoid Intersection and. Weighted Greedoid Maximization

Tractability. Some problems are intractable: as they grow large, we are unable to solve them in reasonable time What constitutes reasonable time?

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

Data Structures in Java

P, NP, NP-Complete, and NPhard

the tree till a class assignment is reached

CS 5114: Theory of Algorithms. Tractable Problems. Tractable Problems (cont) Decision Problems. Clifford A. Shaffer. Spring 2014

Lecture notes on OPP algorithms [Preliminary Draft]

Theory of Computer Science

20.1 2SAT. CS125 Lecture 20 Fall 2016

Theory of Computer Science. Theory of Computer Science. E1.1 Motivation. E1.2 How to Measure Runtime? E1.3 Decision Problems. E1.

Chapter 7: Time Complexity

Non-Approximability Results (2 nd part) 1/19

Principles of AI Planning

Comp487/587 - Boolean Formulas

Branching. Teppo Niinimäki. Helsinki October 14, 2011 Seminar: Exact Exponential Algorithms UNIVERSITY OF HELSINKI Department of Computer Science

CHAPTER 3 FUNDAMENTALS OF COMPUTATIONAL COMPLEXITY. E. Amaldi Foundations of Operations Research Politecnico di Milano 1

Comparison of several polynomial and exponential time complexity functions. Size n

Intelligent Systems (AI-2)

Fuzzy and Rough Sets Part I

1 AC 0 and Håstad Switching Lemma

Introduction to Languages and Computation

CSE 3500 Algorithms and Complexity Fall 2016 Lecture 25: November 29, 2016

1 Computational Problems

Essential facts about NP-completeness:

1. Introduction Recap

Principles of Knowledge Representation and Reasoning

CSE 105 THEORY OF COMPUTATION

Computational Complexity of Bayesian Networks

Notes for Lecture Notes 2

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

Lecture 20: conp and Friends, Oracles in Complexity Theory

Complexity Theory Part II

CS 151 Complexity Theory Spring Solution Set 5

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Machine Learning

Algorithms Design & Analysis. Approximation Algorithm

Tractable & Intractable Problems

Finish K-Complexity, Start Time Complexity

PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Transcription:

Optimization and Complexity Decision Systems Group Brigham and Women s Hospital, Harvard Medical School Harvard-MIT Division of Health Sciences and Technology

Aim Give you an intuition of what is meant by Optimization P and NP problems NP-completeness NP-hardness Enable you to recognize formals of complexity theory, and its usefulness

Overview Motivating example Formal definition of a problem Algorithm and problem complexity Problem reductions NP-completeness NP-hardness Glimpse of approximation algorithms and their design

What is optimization? Requires a measure of optimality Usually modeled using a mathematical function Finding the solution that yields the globally best value of our measure

What is the problem? Nike: Just do it Not so simple: Even problems that are simple to formally describe can be intractable Approximation is necessary Complexity theory is a tool we use to describe and recognize (intractable) problems

Example: Variable Selection Data tables T and V have n predictor columns and one outcome column. We use machine learning method L to produce predictive model L(T) from data table T. We can evaluate L(T) on V, producing a measure E(L(T),V). We want to find a maximal number of predictor columns in T to delete, producing T, such that E(L(T ),V) = E(L(T), V) There is no known method of solving this problem optimally (e.g, NP-hardness of determining a minimal set of variables that maintains discernibility in training data, aka the rough set reduct finding problem).

Search for Optimal Variable Selection The space of all possible selections is huge 43 variables, 2 43-1 possibilities of selecting a non-empty subset, each being a potential solution one potential solution pr. post-it gives a stack of post-its reaching more than half way to the moon

Search for Optimal Variable Selection Search space discrete structure that lends itself to stepwise search (add a new or take away one old) optimal point is not known, nor is optimal evaluation value {a,b,c} {a,b} {a,c} {b,c} {a} {b} {c}

Popular Stepwise Search Strategies Hill climbing: select starting point and always step in the direction of most positive change in value

Popular Stepwise Search Strategies Simulated annealing: select starting point and select next stepping direction stochastically with increasing bias towards more positive change

Problems Exhaustive search: generally intractable because of the size of the search space (exponential in the size of variables) Stepwise: no consideration of synergy effects Variables a and b considered in isolation from each other are excluded, but their combination would not be

Genetic Algorithm Search population of solutions Stochastic selection of parents with bias towards fitter individuals mating and mutation operations on parents Merging of old population with offspring Repeat above until no improvement in population

GA Optimization Animation

Addressing the Synergy Problem of Stepwise Search Genetic algorithm search Non-stepwise, non-exhaustive Pros: Potentially finds synergy effects Does not a priori exclude any parts of the search space Cons: Computationally expensive Difficult to analyze, no comprehensive theory for parameter specification

Variable Selection for Logistic Data: Regression using GA 43 predictor variables Outcome: MI or not MI (1 or 0) Training (T, 335 cases) and Holdout (H, 165 cases) from Sheffield, England External validation (V, 1253 cases) from Edinburgh, Scotland

GA Variable Selection for LR: Generational Progress Fitness value evolution Fitness 0.9 0.8 0.7 0.6 0.5 0.4 Mean Max Min 0.3 0 20 40 60 80 100 120 140 160 Generation

GA Variable Selection for LR: Results Table presenting results on validation set E, including SAS built-in variable selection methods (removal/entry level 0.05) Selection Size ROC AUC Genetic 6 0.95 none 43 0.92 Backward 11 0.92 Forward 13 0.91 Stepwise 12 0.91 P < 0.05

Problem Example Boolean formula f (with variables) Is there a truth assignment such that f is true? Does this given truth assignment make f true? Find a satisfying truth assignment for f Find a satisfying truth assignment for f with the minimum number of variables set to true

Problem Formally Defined A problem P is a relation from a set I of instances to a set S of solutions: P I S Recognition: is (x,y) P? Construction: for x find y such that (x,y) P Optimization: for x find the best y such that (x,y) P

Solving Problems Problems are solved by an algorithm, a finite description of steps, that compute a result given an instance of the problem.

Algorithm Cost Algorithm cost is measured by How many operations (steps) it takes to solve the problem (time complexity) How much storage space the algorithm requires (space complexity) on a particular machine type as a function of input length (e.g., the number of bits needed to store the problem instance).

O-Notation O-notation describes an upper bound on a function let g,f: N N f(n) is O(g(n)) if there exists constants a,b,m such that for all n=m f(n) = a * g(n) + b

O-Notation Examples f(n) = 9999999999999999 is O(1) f(n) = 1000000n + 100000000 is O(n) f(n) = 3n 2 + 2n 3 is O(n 2 ) (Exercise: convince yourselves of this please)

Worst Case Analysis Let t(x) be the running time of algorithm A on input x Let T(n) = max{t(x) x = n} I.e., T(n) is the worst running time on inputs not longer than n. A is of time complexity O(g(n)) if T(n) is O(g(n))

Problem Complexity A problem P has a time complexity O(g(n)) if there exists an algorithm that has time complexity O(g(n)) Space complexity is defined analogously

Decision Problems A decision problem is a problem P where the set of Instances can be partitioned into Y P of positive instances and N P of non-positive instances, and the problem is to determine whether a particular instance is a positive instance Example: satisifiability of Boolean CNF formulae, does a satisfying truth assignment exist for a given instance?

Two Complexity Classes for Decision Problems P all decision problems of time complexity O(n k ), 0 = k = NP all decision problems for which there exists a nondeterministic algorithm with time complexity O(n k ), 0 = k =

What is a non-deterministic algorithm? Algorithm: finite description (program) of steps. Non-deterministic algorithm: an algorithm with guess steps allowed.

Computation Tree Each guess step results in a branching point in a computation tree Example: satisfying a Boolean formula with 3 variables 0 1 0 1 0 1 Y N Y Y Y N Y N ((~a b) ~c) a b c

Non-deterministic algorithm time complexity A ND algorithm A solves the decision problem P in time complexity t(n) if, for any instance x with x = n, A halts for any possible guess sequence and x Y P if and only if there exists at least one sequence which results in YES in time at most t(n)

P and NP We have that P NP If there are problems in NP that are not in P is still an open problem, but it is strongly believed that this is the case.

Problem Reduction A reduction from problem P 1 to problem P 2 presents a method for solving P 1 using an algorithm for P 2. P 2 is then intuitively at least as difficult as P 1

Problem Reduction Problem P 1 is reducible to P 2 if there exists an algorithm R which solves P 1 by querying an oracle for P 2. In this case, R is said to be a reduction from P 1 to P 2, and we write P 1 = P 2 If R is of polynomial time complexity we write P 1 = p P 2 x R x P 2 (x ) Oracle P 1 (x)

NP-completeness A decision problem P is NP-complete if It is in NP, and For any other problem P in NP we have that P = p P, This means that any NP problem can be solved in polynomial time if one finds a polynomial time algorithm for NP-complete P There are problems in NP for which the best known algorithms are exponential in time usage, meaning that NP-completeness is a sign of problem intractability

Optimization Problems Problem P is a quadruple (I P, S P, m P, g P ) I P is the set of instances S P is a function that for an instance x returns the set of feasible solutions S P (x) m P (x,y) is the positive integer measure of solution quality of a feasible solution y of a given instance x g P is either min or max, specifying whether P is a maximization or minimization problem The optimal value for m P for x over all solutions is denoted m P (x). A solution y for which m P (x,y) = m P (x) is called optimal and is denoted y*(x).

Optimization Problem Example Minimum hitting set problem I = { C C 2 U } S = {H H c, c C } m(c,h) = H g = min

Complexity Class NPO An optimization problem (I, S, m, g) is in NPO if 1. An element of I is recognizable as such in polynomial time 2. Solutions of x are bounded in size by a polynomial q( x ), and are recognizable as such in q( x ) time 3. m is computable in polynomial time Theorem: For an NPO problem, the decision problem whether m(x) = K (g=min) or m(x) = K (g=max) is in NP

Complexity Class PO An optimization problem P is said to be in PO if it is in NPO and there exists an algorithm that for each x in I computes an element y*(x) and its value m(x) in polynomial time

NP-hardness NP-completeness is defined for decision problems An optimization problem P is NPhard if we for every NP decision problem P we have that P = p P Again, NP-hardness is a sign of intractability

Approximation Algorithms An algorithm that for an NPO problem P always returns a feasible solution is called an approximation algorithm for P Even if an NPO problem is intractable it might not be difficult to design a polynomial time approximation algorithm

Approximate Solution Quality Any feasible solution is an approximate solution, and is characterized by the distance from its value to the optimal one. An approximation algorithm is characterized by its complexity, and by the ratio of the distance above to the optimum, and the growth of this performance ratio with input size An algorithm is a p-approximate algorithm if the performance ratio is bounded by the function p in input size

Some Design Techniques for Approximation Algorithms Local search Given solution, search for better neighbor solution Linear programming Formulate problem as a linear program Dynamic Programming Construct solution from optimal solutions to subproblems Randomized algorithms Algorithms that include random choices Heuristics Exploratory, possibly learning strategies that offer no guarantees

Thank you That s all folks