Let s assume that 700 patients suffering from kidney stones have been scored for: the size of the stones, classified into either large or small

Size: px

Start display at page:

Download "Let s assume that 700 patients suffering from kidney stones have been scored for: the size of the stones, classified into either large or small"

Mae Banks
5 years ago
Views:

1 Chapter Logistic regression. A dataset Let s assume that 700 patients suffering from kidney stones have been scored for: the size of the stones, classified into either large or small the type of treatment they have received: either open surgery or ultrasounds the recurrence (reported as a failure or not (reported as a success within a given period of time. he question of interest is to identify whether size and treatment impact on the success rate. he following tables have been obtained respectively for the 562 successes and the 38 failures found in the dataset: Successes Failures Number Open US otal Number Open US otal Large Large Small Small otal otal Logistic regression: computing the log-likelihood We will assume that we have obtained some kind of best model with the following form: logit(p i µ + x iα α S + x iβ β In this equation, p i is the probability of Success for patient i, µ stands for an overall mean, α S is the effect of the size of the stone (small (S or large (L, β is the effect of the treatment (open surgery (O or ultra-sounds (US. x iα is if patient i suffered from a large stone and - if the stone was considered as small. Similarly, x iβ is if the patient was cured using open surgery, and - if ultra-sounds were used. In this equation, µ, α S and β are the unknown parameters of the model and need t be estimated, while x iα and x iβ are known

2 coefficients (equal to - or only dependent on the collected dataset. Each of the four possible types of individuals has thus a representation depending on this set of parameters: patients with large stones and open surgery : logit(p i µ + α S + β patients with large stones and ultra-sounds : logit(p i µ + α S β patients with small stones and open surgery : logit(p i µ α S + β patients with small stones and ultra-sounds : logit(p i µ α S β he likelihood is the probability of the observations. If we assume that the 700 observations are independent (a reasonable assumption, the probability of the observations is the product of the individual probabilities. For each individual, the probability of a Success is p i, and the probability of a Failure is consequently ( p i. Using the 4 categories given above, we can easily obtain the corresponding probabilities of Success as: patients with large stones and open surgery : p i (µ + α S + β p(l, O + (µ + α S + β patients with large stones and ultra-sounds : p i (µ + α S β p(l, U + (µ + α S β patients with small stones and open surgery : p i (µ α S + β p(s, O + (µ α S + β patients with small stones and ultra-sounds : p i (µ α S β p(s, U + (µ α S β Of course, the probabilities of Failure are given by ( p i for each category: patients with large stones and open surgery : p(l, O + (µ + α S + β 2

3 patients with large stones and ultra-sounds : p(l, U + (µ + α S β patients with small stones and open surgery : p(s, O + (µ α S + β patients with small stones and ultra-sounds : p(s, U + (µ α S β he number of Success and Failure in each category is given in the table above, which allows to obtain the likelihood as: L p(l, O 92 ( p(l, O 7 p(l, U 55 ( p(l, U 25 p(s, O 8 ( p(s, O 6 p(s, U 234 ( p(s, U 36.3 Obtaining the maximum likelihood estimators he estimators of the parameters of the model will be taken as the values that maximize this likelihood. Since the values that maximize a function are the same as the ones maximizing the logarithm of that function, we will work on the logarithm of the likelihood (log-likelihood because it is easier to manipulate: l ln(l 92 ln[p(l, O] + 7 ln[ p(l, O] + 55 ln[p(l, U] + 25 ln[ p(l, U] + 8 ln[p(s, O] + 6 ln[ p(s, O] ln[p(s, U] + 36 ln[ p(s, U] Replacing the 4 probabilities with the onential ressions given above leads to: l 92 (µ + α S + β 263 ln ( + e µ e α S e β +55 (µ + α S β 80 ln ( + e µ e α S e β +8 (µ α S + β 87 ln ( + e µ e α S e β +234 (µ α S β 270 ln ( + e µ e α S e β 562 µ 68 α S 6 β 263 ln ( + e µ e α S e β 80 ln ( + e µ e α S e β 87 ln ( + e µ e α S e β 270 ln ( + e µ e α S e β 3

4 his function is to be maximized with respect to the 3 parameters to obtain maximum likelihood estimates. Using optimization procedures, these estimates turn out to be µ.4849, α S and β Association measures Based on these estimations, it is straightforward to compute the Success probabilities given above: ˆp(L, O + ˆp(L, U ˆp(S, O ˆp(S, U ( ( ( ( ( ( ( ( It should be mentioned here that, due to the simultaneous estimations of all the parameters of the model, these means are (slightly different from the raw means obtained directly from the table above. his is shown in the following table: Probability Estimated Raw p(l, O p(l, U p(s, O p(s, U

5 he conclusions are similar to the ones made with the raw probabilities: no matter the size of the stones, the success probabilities are higher for the open surgery than for the ultra-sounds treatment. Another used measure is obtained by computing the odds, defined as the ratio O p p. In our results, this gives: Ô(L, O Ô(L, U Ô(S, O Ô(S, U Of course, these results can also be obtained directly using the estimators derived above: ˆp L,O Ô(L, O ˆp L,O ˆp L,U Ô(L, U ˆp L,U ˆp S,O Ô(S, O ˆp S,O ˆp S,U Ô(S, U ˆp S,U hese odds can be used to compute Odds-Ratio (OR, defined as a simple ratio of the Odds computed in the previous section. hese OR are interesting 5

6 for the following reason. Let s start by computing algebraically these OR: ˆ OR(LO/SO ÔL,O Ô S,O (2 ˆα S ˆ OR(LU/SU ÔL,U Ô S,U (2 ˆα S ˆ OR(LO/LU ÔL,O Ô L,U ˆ OR(SO/SU ÔS,O (2 ˆβ Ô S,U (2 ˆβ It is thus demonstrated that: and: ˆ OR(LO/SO ˆ OR(LO/LU ˆ OR(LU/SU ˆ OR(SO/SU ˆ OR(L/S e 2 ˆα S ˆ OR(O/U e 2 ˆβ.4294 In the absence of an effect of the size of the stone on the probability of Success, OR(L/S should be equal to. Significantly different values would indicate that the size of the stone impacts the success of the treatment. Similarly, if the type of treatments does not matter, OR(O/U should be. It can be demonstrated that OR(L/S ˆ is significantly lower than (at the α 0.05 threshold, so indicating that large stones have a negative impact on the probability of success. On the other hand, no significant difference (at the α 0.05 threshold between open surgery and ultra-sounds has been demonstrated in this eriment (i.e. OR(O/U ˆ is not significantly different from. 6

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we