Topic 3: Hypothesis Testing

Similar documents
Topic 5: Generalized Likelihood Ratio Test

Composite Hypotheses and Generalized Likelihood Ratio Tests

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

10. Composite Hypothesis Testing. ECE 830, Spring 2014

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

Detection theory. H 0 : x[n] = w[n]

PATTERN RECOGNITION AND MACHINE LEARNING

Association studies and regression

Lecture 8: Signal Detection and Noise Assumption

Topic 2: Logistic Regression

Lecture 21: October 19

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Statistical testing. Samantha Kleinberg. October 20, 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009

Intelligent Systems Statistical Machine Learning

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Introductory Econometrics

Intelligent Systems Statistical Machine Learning

Statistical Methods for Particle Physics (I)

Lecture Testing Hypotheses: The Neyman-Pearson Paradigm

Importance Sampling and. Radon-Nikodym Derivatives. Steven R. Dunbar. Sampling with respect to 2 distributions. Rare Event Simulation

Hypothesis testing (cont d)

STAT 830 Hypothesis Testing

Lecture 10: Generalized likelihood ratio test

HYPOTHESIS TESTING: FREQUENTIST APPROACH.

Bayesian Decision Theory

Mathematical Statistics

Lecture 2: Review of Basic Probability Theory

8: Hypothesis Testing

Lecture 7 Introduction to Statistical Decision Theory

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing

Lecture 13: p-values and union intersection tests

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

Bayes Rule for Minimizing Risk

Chapter 5: HYPOTHESIS TESTING

Statistical Inference. Hypothesis Testing

Lecture 12 November 3

Chris Piech CS109 CS109 Final Exam. Fall Quarter Dec 14 th, 2017

Frequentist Statistics and Hypothesis Testing Spring

STAT 830 Hypothesis Testing

Confidence Intervals and Hypothesis Tests

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Notes on the Multivariate Normal and Related Topics

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Data Mining. CS57300 Purdue University. March 22, 2018

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Probability and Statistics. Terms and concepts

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Announcements. Proposals graded

Decision-making, inference, and learning theory. ECE 830 & CS 761, Spring 2016

Lecture 22: Error exponents in hypothesis testing, GLRT

DETECTION theory deals primarily with techniques for

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Math Review Sheet, Fall 2008

Review. December 4 th, Review

Binary Logistic Regression

Bayesian Decision Theory

Some General Types of Tests

Political Science 236 Hypothesis Testing: Review and Bootstrapping

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

LECTURE NOTE #3 PROF. ALAN YUILLE

Topic 2: Review of Probability Theory

Methods for Statistical Prediction Financial Time Series I. Topic 1: Review on Hypothesis Testing

Statistics 3858 : Contingency Tables

Machine Learning Linear Classification. Prof. Matteo Matteucci

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests

Topic 17: Simple Hypotheses

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Examples and Limits of the GLM

Lecture 8: Information Theory and Statistics

ECE521 week 3: 23/26 January 2017

Introduction to Statistical Inference

Review of Probability Theory

Chapter 9: Hypothesis Testing Sections

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Bias Variance Trade-off

Topic 10: Hypothesis Testing

14.30 Introduction to Statistical Methods in Economics Spring 2009

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

STAT 801: Mathematical Statistics. Hypothesis Testing

simple if it completely specifies the density of x

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Solving with Absolute Value

This does not cover everything on the final. Look at the posted practice problems for other topics.

Masters Comprehensive Examination Department of Statistics, University of Florida

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Stat 206: Estimation and testing for a mean vector,

Transcription:

CS 8850: Advanced Machine Learning Fall 07 Topic 3: Hypothesis Testing Instructor: Daniel L. Pimentel-Alarcón c Copyright 07 3. Introduction One of the simplest inference problems is that of deciding between two options (hypotheses). Example 3. (Healthy vs. Diabetic). The blood glucose level (in mg/dl) of a healthy person can be modeled as N (95, σ ), while that of a diabetic can be modeled as N (40, σ ). Given a new patient with glucose level x, you want to decide between two hypotheses: : x N (95, σ ) H : x N (40, σ ) healthy, diabetic. and H are often called null and alternative hypotheses. Example 3. (Radar). A radar is constantly emitting a signal and monitoring to see if it bounces back (see Figure 3.). The signal x that the radar receives can be modeled as N (0, σ ) if there is nothing (hence the signal doesn t bounce back) and N (µ, σ ) for some µ > 0 if an object is present (hence signal bounces back). Thus it needs to decide between: : x N (0, σ ) nothing there, H : x N (µ, σ ), µ > 0 something there. Example 3.3 (Astrophysics). The NASA wants you to determine whether two meteorites one that fell in Roswell, New Mexico, and one that fell in Chelyabinsk, Russia came from the same asteroid in space. With help from the materials expert in your interdisciplinary team, you are able to determine that if two meteorites come from the same asteroid, the difference x of their magnesium composition Figure 3.: A radar is constantly receiving a signal, and needs to decide whether an object is present or not. See Example 3.. 3-

Topic 3: Hypothesis Testing 3- Figure 3.: Gene microarrays are data matrices indicating gene activation levels. Each row corresponds to one gene, and each column corresponds to one individual. We want to know which genes are related to a disease. See Example 3.4. can be modeled as N (0, σ ), and N (µ, σ ) otherwise for some unknown µ 0. Hence you need to decide between: : x N (0, σ ) same asteroid, H : x N (µ, σ ), µ 0 different asteroids. Example 3.4 (Genetics). Have you wondered how geneticists determine which genes are associated to which diseases? Essentially, they compare the average activation levels of a gene in healthy and sick individuals (see Figure 3.). The difference x between these activation levels can be modeled as N (0, σ ) if the gene is unrelated to the disease, and N (µ, σ ) for some unknown µ 0 if the gene is related to the disease. We thus have to decide between: : x N (0, σ ) gene unrelated to disease, H : x N (µ, σ ), µ 0 gene related to disease. Example 3.5 (Treatment design). Scientists often want to design a treatment (e.g., a drug or procedure) for a disease (e.g., diabetes or cancer). To this end they measure the disease presence (e.g., glucose level or tumor size) before and after treatment in N patients. The differences x i can be modeled as independent and identically distributed (i.i.d.) N (0, σ ) if the treatment is ineffective, and N (µ, σ ) for some µ < 0 if the treatment is effective. Hence we have : x,..., x N H : x,..., x N iid N (0, σ ) treatment is ineffective, iid N (µ, σ ), µ < 0 treatment is effective. Example 3.6 (Neural activity). Scientists want to determine which regions of the brain are related to certain tasks using functional magnetic resonance imaging (fmri), which essentially creates a video of the brain using magnetic fields that map hydrogen density. For example, say they want to know which region of the brain controls the thumb. Then they take an individual, ask her to move her thumb

Topic 3: Hypothesis Testing 3-3............ Figure 3.3: Left: Signal of the thumb µ R D, usually a sinusoid with the periodicity of the thumb movement. Right: Each pixel produces one signal vector x ij R D containing the brain measurements in that pixel over time. Some pixels may show neural activity correlated with the signal of the thumb. We want to find such pixels. See Example 3.6. periodically, and take an fmri video of her brain. Then scientists analyze one pixel at a time. The (i, j) th pixel will produce a signal vector x ij R D containing the brain measurements in that pixel over time. Some pixels will show neural activity correlated with the signal of the thumb µ R D, usually a sinusoid with the periodicity of the thumb movement (see Figure 3.3). Then x ij can be modeled as N (0, σ I) if the pixel is uncorrelated to the thumb movement, and N (µ, σ I) if the pixel is correlated (I denotes the identity matrix of compatible size, in this case D D). Hence for each pixel (i, j) they have to decide: : x ij H : x ij iid N (0, σ I) iid N (µ, σ I) (i, j) th pixel is uncorrelated, (i, j) th pixel is correlated. Definition 3. (Hypothesis test). A hypothesis test is a function t : Ω {, H }. 3. The Likelihood Ratio Test In general, hypothesis testing is all about deciding between two options. We observe a random variable x, and want to decide whether : x p 0 (x), H : x p (x). If your hunch is to simply pick whichever is larger between p 0 (x) and p (x), your intuition is correct. That is essentially the likelihood ratio test (LRT) in its most elemental form: p (x) p 0 (x) H, which in words means: let s make a test: if the likelihood ratio Λ(x) := p(x) p 0(x) is larger than (meaning p is larger), then pick H. Similarly, if Λ(x) < (meaning p 0 is larger), then pick.

Topic 3: Hypothesis Testing 3-4 Figure 3.4: Likelihood ratio test x H µ in Example 3.7. Remark 3. (Likelihood). The term likelihood is often a source of confusion. To be more precise, in hypothesis testing we observe an instance of a random variable, i.e., we observe data x = x, and we want to decide which of two distributions (p 0 or p ) is more likely to have generated this data. The likelihood that p 0 generated x is essentially p 0 (x) [evaluated at x = x], and similarly for p. Example 3.7 (Radar). Consider the hypothesis problem in Example 3.. Then Λ(x) = p (x) p 0 (x) = πσ e ( x µ σ ) = e πσ e ( x 0 σ ) Since both sides are positive, taking log we obtain: x xµ+µ σ e x σ = e µ(x µ) σ H. µ(x µ) σ H 0. and since µ > 0, this further simplifies into the following test: x H µ. In words, this LRT tells us to decide H if our observed data x is larger than µ /, and otherwise (see Figure 3.4). 3.3 Outcomes and Decision Regions A test has four possible outcomes, depending on what we decide and the truth: (0 0), (0 ), ( 0), ( ). See Table 3.. Sometimes it is desirable to reduce the probability of one particular kind of error.

Topic 3: Hypothesis Testing 3-5 Example 3.8. In Example 3.5, scientists want to avoid a ( 0) error, which would mean that they believe their treatment cures a disease, when it really doesn t. The probabilities of the four outcomes are determined by the regions of the test. Definition 3. (Decision regions). The decision regions R 0 and R of a test t : Ω {, H } are the inverse images of, and H, i.e., R 0 := {x Ω : t(x) = }, R := {x Ω : t(x) = H }. In words, R 0 is the region of the domain of x where we will decide, and similarly for R. Example 3.9. The decision regions of the test x H µ/ are R 0 = (, µ /) and R = ( µ /, ). Decision regions determine the probability of each outcome as follows: p gh := p h (x)dx, for g, h {0, }. R g Example 3.0. The test x H µ/ has the following outcomes probabilities (see Figure 3.5): p 00 = Q 0,σ ( µ /), p 0 = Q µ,σ ( µ /), p 0 = Q 0,σ ( µ /), p = Q µ,σ ( µ /), Decision (0 0) True Negative No-alarm Accept ( 0) False positive H False alarm Type error Truth H (0 ) False negative Miss Type error ( ) True Positive Detect Reject Table 3.: Four possible outcomes of a test. Depending on the field, they might come under different names. We will use (0 0), (0 ), ( 0), ( ) for simplicity.

Topic 3: Hypothesis Testing 3-6 Figure 3.5: Outcomes probabilities p gh. Left: test x H µ/ from Example 3.7. Right: test x H τ from Example 3.0; we can pick τ such that p 0 < α. Figure 3.6: Q µ,σ (τ, ) = Q(τ µ/σ, ), where Q is shorthand for Q 0,. where Q µ,σ (τ) is the tail probability of the N (µ, σ ) distribution, i.e., Q µ,σ (τ) := If we want to bound p 0, we could modify our test to τ e ( x µ σ ) dx. πσ x H τ, where τ is a threshold selected to make p 0 smaller than the desired probability of error α (see Figure 3.5). This new test has p 0 = Q 0,σ (τ). Furthermore, a simple change of variable shows that ( τ µ ) Q µ,σ (τ) = Q, (3.) σ where we use Q as shorthand for Q 0, (see Figure 3.6 to build some intuition), so p 0 = Q ( τ /σ). Finally, since Q is invertible, if we want p 0 α, we can pick τ = σq (α). 3.4 Neyman-Pearson Lemma As mentioned in Example 3.8, there are some cases where we want to bound a certain probability of error, say p 0. One way to do this is by increasing R 0. However, as R 0 grows, our accuracy p decreases (see

Topic 3: Hypothesis Testing 3-7 Figure 3.5 to build some intuition). Neyman-Pearson s Lemma tells us that the LRT is optimal in the sense that there exists no other test that has lower probability of error p 0 and higher accuracy p. Lemma 3. (Neyman-Pearson). Consider the likelihood ratio test t given by p (x) p 0 (x) H τ, with τ chosen such that p 0 = α. Then there exists no other test t with p 0 α and p > p. Proof. For any region R Ω, let P h (R) be the cumulative probability of p h (x) over R, i.e., P h (R) = p h (x)dx. Then for h {0, }, let p h = P h (R R ) + P h (R R 0), p h = P h (R R ) + P h (R 0 R ). (3.) Now suppose p 0 α. We need to show that p p. From (3.), this is equivalent to showing that P (R R 0) P (R 0 R ), so write P (R R 0) = p (x)dx τp 0 (x)dx = τp 0 (R R 0), (3.3) R R R 0 R R 0 where the inequality follows because in R, p (x) τp 0 (x). By assumption, p 0 = α p 0. This, together with (3.) imply P 0 (R R 0) P 0 (R 0 R ), so (3.3) τp 0 (R 0 R ) = τp 0 (x)dx p (x)dx = P (R 0 R ), R 0 R R 0 R where the last inequality follows because in R 0, p (x) τp 0 (x). Example 3.. Consider : x N (0, σ0), H : x N (0, σ), where σ 0 < σ are known. The likelihood ratio test is Λ(x) = p (x) p 0 (x) = e x σ πσ e x σ 0 πσ 0 = σ 0 e σ ( x σ 0 x σ ) H τ.

Topic 3: Hypothesis Testing 3-8 Figure 3.7: Left: Threshold τ selected to achieve probability of error p 0 = α in test x H τ of Example 3., where : x /σ0 χ. This is equivalent to the test x H τ with : x N (0, σ0 ), as in the Right. Or equivalently, e σ σ 0 σ 0 σ σ σ0 σ0 x σ x H σ τ, σ 0 x H log ( ) σ τ, σ 0 ( σ H σ 0σ ) σ log τ. σ 0 σ 0 }{{} τ Now recall that if y N (0, ), then y χ. So we can rewrite our hypotheses as : ( x /σ 0) χ, H : ( x /σ ) χ. Then p h (with h {0, }) is simply the probability that a σ h -scaled χ random variable is larger than τ (see Figure 3.7), i.e., p h = Q χ ( τ /σ h), where Q χ is the tail probability of the χ distribution. Since Q χ is invertible, if we want p 0 α, we can pick τ = σ0q χ (α), and then p = Q χ ( σ 0/σ Q (α)). Neyman-Pearson s Lemma tells us that χ 0 there exists no other test that has lower probability of error p 0 and higher accuracy p. 3.5 Multiple Observations We now study what happens when we have several observations instead of just one. Example 3.. Consider the hypotheses in Example 3.5, or equivalently, in vector form: : x R N N (0, σ I), H : x R N N (µ, σ I), µ < 0,

Topic 3: Hypothesis Testing 3-9 where denotes the all ones vector of compatible size, in this case N. The likelihood ratio is given by Λ(x) = p (x) p 0 (x) = ( πσ) N e σ (x µ)t (x µ) ( πσ) N e σ xt x Taking log we obtain the log-likelihood ratio test: = e ( σ x T x µx T +µ T ) e σ x T x = e µ σ (x T Nµ). µ σ ( x T Nµ ) H log τ, x T H σ µ log τ + Nµ } {{ } τ Notice that the direction of the inequalities in the test was inverted because µ < 0. Next observe that m = x T = N i= x i, and since sums of gaussians are gaussians, we can rewrite our hypotheses as : m N (0, Nσ ), H : m N (Nµ, Nσ ), µ < 0, Then our log-likelihood ratio test becomes m H τ, and since τ < 0, p 0 = Φ 0,Nσ (τ ), p = Φ Nµ,Nσ (τ ). where Φ µ,σ is the cumulative distribution function (CDF) of a N (µ, σ ) random variable (see Figure 3.8). Equivalently, with a similar transformation as (3.), we can write this in terms of the CDF Φ of the standard normal N (0, ),. p 0 p ( τ = Φ ), Nσ ( τ Nµ = Φ ). (3.4) Nσ Since Φ is invertible, if we want p 0 α, we can pick τ = NσΦ (α). Plugging this in (3.4), we obtain ( ) ( ) NσΦ (α) Nµ Nµ p = Φ = Φ Φ (α). Nσ σ Nµ σ is often known as the signal to noise ratio. Since Φ(τ) as τ, and since µ < 0 by assumption, it is easy to see that p increases with N and µ, but decreases with σ. 3.6 Multiple Testing In many applications we run multiple tests, and we want to bound the probability of making one or more mistakes.

Topic 3: Hypothesis Testing 3-0 Figure 3.8: Outcomes probabilities p gh of the test m H τ. See Example 3.. Example 3.3. In Example 3.6 we have a family of K tests (where K is the number of pixels), and we want to be confident that all the identified pixels are truly correlated to the thumb s movement. Definition 3.3 (Family-wise error rate (FWER)). The family-wise error rate is the probability of making one or more ( 0) errors. More precisely, for a family of K tests, ( K ) F W ER = P k= { k 0 k }, where { k 0 k } denotes the event of deciding H in the k th test given that is true. Lemma 3. (Bonferroni Correction). Consider a family of K tests. Setting p 0 = α /K for each test achieves FWER < α. Proof. As a simple consequence of the union bound, we have: ( K ) K F W ER = P { k 0 k } P ({ k 0 k }) = k= k= K p 0 = K α K = α. k= 3.7 Composite Hypotheses So far we have studied simple hypotheses where all distributions and their parameters are known, as in Examples 3., 3.6 and 3., where the distributions and their parameters are known. However, in many practical situations this is not the case. For instance, in H of Example 3., all we know is that µ > 0.

Topic 3: Hypothesis Testing 3- In this case, H is composed of the collection of distributions {N (µ, σ )} µ>0. More generally, a composite problem has the form: : x p 0 (x θ 0 ), θ 0 Θ 0, H : x p (x θ ), θ Θ, where the notation p h (x θ h ) means that θ h is a parameter of the distribution p h, and Θ h is a set of all possible values of the parameter θ h. In general, p 0 and p may be entirely different distributions, and the sets Θ 0, Θ may be entirely different. Even though Examples 3. and 3.5 are composite problems, since we know µ > 0, we were still able to derive its LRT in Examples 3.7 and 3.. This is not always the case. There are some more complicated cases, like Examples 3.3 and 3.4, where µ is completely unknown, and this can complicate things. Example 3.4 (Wald s test). Consider Examples 3.3 and 3.4. The likelihood ratio is Λ(x) = p (x) p 0 (x) = πσ e ( x µ σ ) = e πσ e ( x 0 σ ) Taking log and with some minor algebra we obtain: x xµ+µ σ e x σ = e µ(x µ) σ H. xµ H µ. However, since we don t know the sign of µ, we cannot continue as in Example 3.7, as dividing by µ could reverse the direction of the inequalities in the test. Hence this test is uncomputable, or undetermined. So how can we proceed? For example, let s say we decide to use the test from Example 3.0: x H τ. It could happen that we are lucky and µ > 0. Then our test will be optimal (as shown by Neyman- Pearson s Lemma) with p 0 = Q( τ /σ) and p = Q( τ µ /σ) (see Example 3.0 and Figure 3.5). However, if we are unlucky and µ < 0, our test would be doing something terribly insensible, and would have terrible accuracy p = Q( τ+ µ /σ); see Figure 3.9 to build some intuition. One good compromise is to use Wald s test: which has p 0 p x H τ, = Q( τ /σ), ( τ µ ) = Q + Q σ ( τ + µ Wald s test is not optimal, but is sensible. It has higher probability of error p 0 than if we are lucky and guess µ correctly, but also has higher accuracy p than if we are unlucky and guess µ incorrectly. σ ).

Topic 3: Hypothesis Testing 3- Figure 3.9: Composite hypothesis test where µ 0 is unknown. Left: Probabilities p 0 and p of the test x H τ. If we are lucky and µ > 0, this test test will be optimal, but if µ < 0, our test will be terrible. Right: Wald s test x H τ. Wald s test is not optimal, but is sensible. It has higher probability of error p 0 than if we are lucky and guess µ correctly, but also has higher accuracy p than if we are unlucky and guess µ incorrectly. See Example 3.4 and Figure 3.0. Figure 3.0: Left: ROC curves of the test x H τ for different values of µ. Consistent with our analysis from Example 3., we can see that p grows with µ. Right: ROC curves for the test x H τ when µ > 0 (optimal), when µ < 0 (terrible), and for Wald s test. This shows that Wald s test is suboptimal but sensible. See Example 3.4. 3.8 ROC Curves As shown in Example 3.4, it is not always possible to device an optimal test. It is thus reasonable to ask how good a test is. For example, how good is Wald s test? One way to do this is with Receiver Operating Characteristic (ROC) curves, which measure a test s performance by plotting its p as a function of its p 0. ROC curves are widely used in laboratories to measure a test s ability to discriminate diseased cases from normal cases, and also to compare the performance of two or more tests. 3.9 Generalized Likelihood Ratio Test Wald s test was an intuitive solution to the simplest composite problem. However, Wald s test has a solid statistical foundation. In fact, Wald s test is the result of estimating µ and then using this estimate in a likelihood ratio test. We will come back to Wald s test and its generalization, introducing the generalized likelihood ratio test (GLRT), but first we will need to learn about estimation, which is our next topic.