Success scans in a sequence of trials

Size: px

Start display at page:

Download "Success scans in a sequence of trials"

Gerard Parker
5 years ago
Views:

1 Success scans in a sequence of trials Example 4.6: application in acceptance sampling. The most classical acceptance sampling is as follows. A supplier passes along a lot of products with a p probability of failure for each of them, and the retailer (on the receiving side) will test a sample of n items, where n is usually much smaller than the total number of products in the lot. Suppose d defective items are found in the sample of n, then the retailer may reject the whole lot if d/n > pre-determined threshold Most acceptance sampling uses small samples, for example, n = 5, 8, or 10, while the size of the whole lot could be in hundreds or thousands. The problem with small samples for attributes (item defective or not defective) is that they do not discriminate well between good and bad quality. Consider an acceptance sampling plan that samples five items from a large lot, and only accepts the lot if none of the five items are defective. The retailer may consider a lot with 10% or more defectives to be of unacceptable quality. Let calculate the chance that sample of five has no defective at all. 1

2 Success scans in a sequence of trials (Example 4.6) p = 0.1, the probability for a single item to be defective Pr(no defective in a sample of five) = (1-p) 5 = 59% Even with 10% defective rate in the original lot, there is a good chance that sample of five may miss it entirely so that the lot is accepted. On the other hand, a lot might have only 1% defective (which could have been agreed by the buyer to be acceptable quality), and yet the large lot has a high chance of having at least one defective sample and being rejected. p = Pr(at least one defective in a sample of five) =1- Pr(no defective in a sample of five) = 1 - (1-p) 5 = 5% > 1% So a single acceptance sampling plan does not seem to please either side of the parties. 2

3 Success scans in a sequence of trials (Example 4.6) Combining the information from consecutive data can improve the effectiveness of the inspection system. Suppose the new rule says that suspend the transaction (between a supplier and a buyer) when at any time there are k rejected lots within m consecutive lots. This is to say, when a single lot is rejected, just mark it or record it but do not necessarily stop accepting new lots from the supplier. But if k-out-of-m rejection happens, the supplier will be made to stop any new shipments until necessary corrective actions are taken to eliminate a potential problem. How well such an acceptance sampling plan can work? - suppose the sample size remains 5; - p = 0.01 or 0.1 for an individual item; - k = 3, m=5, i.e., it is agreed to suspend transactions whenever three out of five consecutive lots are rejected. 3

4 Success scans in a sequence of trials (Example 4.6) When p = 0.01, each lot has 5% probability to be rejected but the transactions are unlikely to be stopped in a trial of 100 lots. When p = 0.1, each lot has 1-(1-p) 5 = 41% probability to be rejected, and it is highly likely to stop the transaction after 15 lots. 4

5 Success scans in a sequence of trials Retrospective scan in a sequence. When in applications, researchers condition on a known or a given value for the total number of successes in the N trials. This is a retrospective analysis. The scan statistics are defined similarly as those in the prospective analysis S m ' and W k '. In retrospective analysis, since the events had occurred, suppose there are a successes and N a failures, so that 5

6 Success scans in a sequence of trials Approximation: 6

7 Success scans in a sequence of trials Example 4.7: birthday problem. In a soccer game, there are 23 people in the field, 11 players from each side plus the referee. How likely is it to find (at least) two people of the 23 persons who share a birthday? Assume that N days of a year are equally likely to be a birthday for a person. Given that there are a people, the probability of no match is 7

8 Success scans in a sequence of trials Example 4.8: the generalized birthday problem. Find the probability that at least two birthdays out of a people fall within any m consecutive days. Because Dec 31 and Jan 1 are adjacent days, the m-day period is treated as a circular case. The probability as asked above is Given a family consisting of a mother, a father and two kids, there are four birthdays and one wedding anniversary. Assuming independence of dates. Is it unusual for at least two of the five special dates to fall within the same 7 day period? N = 365, a = 5, m = 7 Pr (generalized birthday) =

9 Success scans in a sequence of trials Example 4.8: We can use the scan statistic to solve the generalized birthday problem. Try it with a = 5, N = 365, k = 2, m = 7 but L = N/m =52.1, which is not an integer. If let N = 364, then L = N/m = 52. So use N = 364 and L = 52 to approximate. Pr(2 7, 364, 5) = , which is very close to the probability of 0.31 we obtained. 9

10 Success scans in a sequence of trials Example 4.9: A chess grandmaster played 20 tournaments over 1 year, and won nine. Seven of the won tournaments occurred within 10 consecutive tournaments. Assuming that the nine wins occurred independently of each other and completely at random over the 20 tournaments, how likely is it for there to have been 10 consecutive tournaments containing at least seven wins? Here a = 9, N = 20, k = 7, m = 10, L =N/m =2 10

11 Higher dimensional scan Many times, scans in two or three dimensional space are very useful. Typically, 2-D scan is a spatial scan and a 3-D scan is a time-space or spatial-temporal scan. Example - A public health official looks for spatial clusters of disease occurrence; - A geologist scans a region to find clusters of mineral depositions; - An astrophysicist scans the heaven for sources of concentration of gamma ray bursts; - A traffic controller looks at clusters of accidents on a highway that are close in time and space. 11

12 Higher dimensional scan Sometimes, people convert a 2-D scan problem into a 1-D and use the previous approach. For example, Conover, Bement, and Iman (1979) apply a 1-D scan statistic to a 2-D region. They were motivated by an aerial search for uranium deposits under the National Uraniun Resource Evaluation program. A plane flies over the region in a zig-zag fashion 12

13 Higher dimensional scan Consider the problem of scanning a unit square with a sub-rectangle with sides of length u and height v, that are parallel to the sides of the square. For an S T rectangular region, with an actual a b scanning window, they can be mapped into a unit square case: u = a/s and v = b/t, to make S = T =1 unit. Given N points distributed at random over the unit square, let S u,v denote the maximum number of points in any sub-rectangle with sides of length u and height v, parallel to the sides of the unit square. S u,v is a 2-D scan statistic, and let P(k, N, u, v) P(S u,v k) 13

14 Higher dimensional scan Approximation: 14

15 Higher dimensional scan Example 4.10: People looked at clusters of childhood (up to 15 years old) acute leukemia cases in Sweden diagnosed in the 20 years period ending in During the period, there were 1,534 cases of acute childhood leukemia among a population of 1,703,235 children. There is a cluster of three cases acute leukemia among a population of 133 children in a southwest town of Sweden. This was 25 times the average incidence rate for Sweden. Was this cluster significant? View the map of Sweden as stretched (because the population not evenly spread) so that there is a grid of 1,703,235 squares with one person in each square. For simplicity, view the map as a 1,305 1,305 square (1,305 = sqrt(1,703,235)). View the population of the southwest town as a square (11.5 sqrt(133)). Given 1,534 leukemia cases in the whole square, how likely is it to get a cluster of three cases within a subsquare with u = v = 11.5/1305 = ? 15

16 Higher dimensional scan (Example 4.10) Calculate Pr(3, 1,534, , ) Unfortunately, the previous approximation does not work for k = 3 or 4 (your will get a Pr > 1). But try k=5 and 6. When k = 5, Pr = when k =6, Pr = So when k = 3, the probability should be considerably larger than 0,052, suggesting that the cluster is not as unusual as people initially thought. 16

17 Higher dimensional scan Example 4.11: star cluster. In 1767, Reverend Michell scanned the sky and noted that the visual closeness of six of the stars in the Pleiades. The six stars are called Atlas, Mia, Alcyone, Merope, Electra, and Taygeta. Michell noted that we suppose that the whole number of these stars, which are equal in splendor to the faintest of these, to be about 1,500. We want to find the probability that six stars, out of that number, scattered at random in the whole heavens, to be within so small a distance from each other as the Pleiades are. 17

18 Higher dimensional scan Example 4.11: star cluster. Distance in astrophysics: The six stars in Pleiades can be contained in a circle with radius = 31 minutes. This circle on the surface of the celestial sphere has an approximate surface area (31 minutes) 2 = (31 r / ) 2 18

19 Higher dimensional scan Example 4.11: star cluster. - The surface of the whole celestial sphere with radius r is 4 r 2. - So the fraction of the celestial surface within a scanning circle of 31 minutes arc is If approximate this area by a square, then - Implication: This results simply indicates that the stars were not distributed at random. Rather they cluster into clusters of stars. This is now a wellaccepted fact but no so obvious 300 years ago. 19

20 Higher dimensional scan Clustering on the lattice: this is a discrete version of the 2-D scan statistic. In certain applications the events can naturally occur only at a discrete set of points in space. For example, in an agricultural applications, people looks for clusters of diseased plants in a field where plants are (roughly) evenly spaced. In other applications, the underlying events can occur anywhere in space, but the method of observation limits the observed events to occurring on a grid or lattice of points. We here focus on a rectangular R T lattice of points, and consider scanning the lattice by a rectangular m 1 m 2 sublattice, where the sides of the sublattice are parallel to those of the lattice. We deal with the model where events are independent and equally likely to occur at any point of the lattice. 20

21 Higher dimensional scan Let x ij, i = 1,, R, j = 1,,T, denote a rectangular lattice of independent and identically distributed Bernoulli random variables, and Pr(x ij = 1) = 1 Pr(x ij = 0) =p. View the lattice with (1,1) at its lower left corner. Let Define the two-dimensional discrete scan statistic to be 21

22 Higher dimensional scan Approximation But here q(2m-1) and q(2m) need to be calculated from a recursive algorithm in Karwe and Naus (1997). For convenience, there is a look-up table one can use. Approximation 22

23 Higher dimensional scan Example 4.12: minefield detection. A two-dimensional rectangular region on land (or sea) is being checked for minefields. The region is viewed as being divided into a grid of small squares (quadrats). An aircraft flies over the region and processes information from a sensor. When the sensor gives a reading above a specified threshold for a quadrat, the score for the quadrat is recorded as "high." An unusual number of contiguous quadrats that score high suggests a possible minefield. Suppose a large region is divided into 10,000 quadrats and, on average, about 2% of quadrats score high. In screening the map with a 5 5 square of quadrats, we observe a square 25-quadrat area that contains six high quadrats. Is this an evidence of unusual, nonrandom clustering? Here T = 100 = sqrt(10,000), m = 5, p =0.02, k =6. 23

24 Higher dimensional scan Example 4.13: A photograph shows 500 galaxies of a certain brightness. Divide the photographic plate by an evenly spaced T T grid consisting of T 2 cells. Let T = 100. Assume that the galaxies are distributed completely at random over the photograph. if one observes 17 galaxies within a scanning square, how unusual is it? There are a total of T T = 10,000 cells. If 500 galaxies are distributed randomly, the probability that each cell has a galaxy is Pr = 500/10,000 =0.05 Then, T = 100, m =10, k=17, so Pr(17 10, 0.05,100) 0.04 (from Table ) The distribution shows strong evidence of clustering. 24

25 Higher dimensional scan Discussion on shape of the scanning window. Does the shape of a scanning window (of a given area) make a difference on the probability of a large cluster? See the following example. Using formula Pr(k,N,u,v) Pr(5, 5, 0.1, 0.9) = Pr(5, 5, 0.3, 0.3) = where = Another example: Pr(4, 5, 0.15, 0.4) = Pr(4, 5, 0.2, 0.3) = Pr(4,5, , ) = One can observe that in scanning a unit square, for a give size of uv, P(k,N, u, v) is maximized for u = v. 25

26 Higher dimensional scan Scanning window type is still an ongoing research subject. Here are a few common understanding: - Square windows and circular windows are generally popular because they involve fewer parameters. - Some shapes, such as ellipses or rectangles, can be oriented to yield greater power, for example, by taking into account the direction of wind in spreading pollutants. - No matter which shape window is used, there is a need for efficient algorithms to reduce the complexities of getting the probability distribution of the scan statistic. Typically, that risk is simpler for the scanning window whose sides are parallel to the original region. 26

27 Higher dimensional scan: varying the size of scan window In the treatment so far, the scan statistics are defined over a sliding scan window (an improvement over control chart), but still with a fixed window size. Next we will allow the window size to change, which empowers the scan statistic to detect underlying clusters/patterns whose size is not known a priori. The major downside is the computational complexity in doing so. A major portion of the following materials is based on the work done by Kulldorff (a statistician working for CDC) and Knox. 27

28 Higher dimensional scan: varying the size of scan window A general model: - Let A be the area in which events may occur, a subset of Euclidean space where different dimensions may represent either physical space or time. So this model is applicable to both spatial scan statistic and spatial-temporal scan statistic. - On A, define a measure, representing a known underlying intensity that generates events under the null hypothesis. For a homogeneous Poisson process on a rectangle A, we have (x) = for all x A. - Under the null hypothesis, the number of events in any given area could be binomially distributed, so that or, be Poissonly distributed 28

29 Higher dimensional scan: varying the size of scan window For instance, the set A contains a group of locations, s i. Each location s i may represent a zip code, with location (latitude and longitude) assumed to be at the centroid of the zip code. Then, we collect a count c i for s i as the number of, say, respiratory disease cases in that zip code, and b i as the baseline information, for example, the at-risk population. Here, c i is the x(s i ) and b i is the (s i ), for s i A If b i is the underlying population of location s i, we would expect each count be proportional to its population under the null hypothesis. If b i is the expected count if location s i, and then we expect each count to equal the expected count of location s i under the null hypothesis. In either case, b i, the underlying measure, is either given or can be inferred from the past counts. For example, over-the-counter (OTC) drug sales may be estimated based on a time series model using historical sales records. 29

30 Higher dimensional scan: varying the size of scan window With a varying size window, a scan statistic is treated as an interval, area, or volume of fixed shape, which then movers across the study area. As it moves, its size varies and it defines a collection W of zones w A. Conditioning on the observed total number of events, x(a), the definition of the scan statistic is the maximum likelihood ratio over all possible zones: When assume that the underlying process follows a Binomial model or a Poisson model, the likelihood function can be explicitly written down. For example, under a Poisson model 30

31 Higher dimensional scan: varying the size of scan window Under a more specific setting, define More specifically 31

32 Higher dimensional scan: varying the size of scan window Under the previous setting: So this likelihood ratio can be easily calculated for a given region w. 32

33 Higher dimensional scan: varying the size of scan window Even though calculating the likelihood function for a given region is pretty straightforward, calculating the scan statistic S w and performing the subsequent statistical testing could be computational difficult because calculating S w needs to exhaust all possible positions and sizes of the scan window. For the cases with varying window sizes, it is not easy to get a good approximation, either, not so as in the fixed window-size cases. People rely on the Monte Carlo (MC)-based hypothesis testing procedure. 33

34 Higher dimensional scan: varying the size of scan window General procedure for MC-based hypothesis testing 34

35 Higher dimensional scan: varying the size of scan window The detailed procedures to implement this MC-based hypothesis test are developed by many different researchers. Here we introduce a free software that can be used to search for clusters in a dataset. Software: SaTScan. You can download at together with a user manual and some dataset. 35

36 Higher dimensional scan: varying the size of scan window Example 4.14: brain cancer in New Mexico. This example, together with a few others, is included in the SaTScan software. Three files: nm.cas cancer records nm.pop population nm.geo coordinates of each county Overall, the cancer data are broken down by age and sex. Brain cancer and population data are available from 1973 to 1992 at the aggregated level of 32 counties. A circular variable size window was used. The circle centroids are limited to the county centroids, while the radius varies continuously from zero and up until it include 50% of the total population at risk. 36

37 Higher dimensional scan: varying the size of scan window (Example 4.14) brain cancer in New Mexico. Here on a spatial analysis is performed but a spatial-temporal analysis can be performed likewise. When scan for areas with (unusually) high-rate cluster: - Need to check "High Rates" in the software, the "Analysis" tab; - A cluster is found in and around Albuquerque, containing Cernadillo, Cibola-Valencia, Los Alamos, Sandoval, San Miguel, Santa Fe, Socorro and Torrance counties. - With 642 cases when were expected, this area has a rate 10% higher than the New Mexico average, and it is significant with p-value =

38 Higher dimensional scan: varying the size of scan window (Example 4.14) When scan for areas with (unusually) low-rate cluster: - Need to check for "Low Rates" in the "Analysis" tab; - A cluster is found for Lea and Eddy counties combined. - With 98.3 expected but 72 cases actual, these counties has an incident rate about 26.7% lower than the state average. But the p-value = 0.167, not very significant one. When use the two-sided scan, the clusters found would be the same but p-value =

39 Higher dimensional scan: varying the size of scan window (Example 4.14) Spatial clusters for brain cancer incidence in New Mexico The most likely cluster is the grey area and the area with low rate is the shaded area. 39

40 Higher dimensional scan: varying the size of scan window Revisit Example 1.4: The paper performed the analysis can be assessed at 40

41 Example 1.4 Early warning system for West Nile virus Example 1.4: Since 1999 West Nile virus (WNV) outbreak in New York City, which caused thousands of human infection and 59 severe cases including 7 deaths, health officials have been searching for an early warning system that could have prevent human illness and death. In the summer of 2001, the New York City Department of Health and Mental Hygiene established a citywide network of adult mosquito traps, sentinel bird flocks, and system for reporting, collecting, and testing dead birds. Health officials try to use the collected data and the pattern embedded therein to set off public health alerts enough time before onset of human cases. 41

42 Example 1.4 Early warning system for West Nile virus 42

43 Example 1.4 Early warning system for West Nile virus Data Collection - Dead birds were reported by the public through an interactive voiceresponse telephone system or the internet. - The information included the date found and the location and species of the dead bird. Since pigeon deaths are common but rarely associated with WVN, they were excluded from all clustering analysis. - A sample of dead birds that met selection criteria were submitted for necropsy and testing. 43

44 Example 1.4 Early warning system for West Nile virus Data Collection (continued) - Mosquitoes were collected weekly from > 100 traps dispersed throughout the city. - Multiple mosquitoes from the same trap and of the same species were pooled, and each pool was tested for evidence of WNV; a result was considered positive if at least one mosquito was infected. - NYC Department of Health and Mental Hygiene conducted citywide active hospital-based physician and lab surveillance for human WNV cases. 44

45 Example 1.4 Early warning system for West Nile virus Data Collection (continued) - All dead bird reports, mosquito traps, and human case-patients with address information were geocoded to a point location (when possible, using the ArcView GIS software). - Multiple dead bird reports from the same location on the same date were counted as one. Dead bird reports were attributed to the census tracts (a total of 2,215 tracts, with an average size of 0.13 square miles). - The latitude and longitude in decimal degrees for each census tract centroid were used in scan statistics analysis. 45

46 Example 1.4 Early warning system for West Nile virus Spatial Statistics Scan - Use a circular window, and perform prospective analysis. - To adjust the analysis of geographic variability, we used historical dead bird counts from a given census tract as baseline controls; recent bird counts were used as cases. - Define cases as the dead bird reports that occurred in the 7 days before the date of analysis. - A minimum 2-week buffer zone between case and control birds was established, thereby limiting the influence of emerging clusters on the analysis. - Need to perform some pre-analysis using a SAS code. 46

Example 1.4 Early warning system for West Nile virus Spatial Statistics Scan on 2001 data. - Data for dead bird clustering analysis were first available on June 22.

47 Example 1.4 Early warning system for West Nile virus Spatial Statistics Scan on 2001 data. - Data for dead bird clustering analysis were first available on June 22. Analysis showed two clusters: central Staten Island and eastern Queens. This prompted a program of intensified larval surveillance and control as well as abatement of standing water starting June

48 Example 1.4 Early warning system for West Nile virus Spatial Statistics Scan on 2001 data. - Reports from these areas were prioritized for dead bird pickup and testing, and additional mosquito traps were set. - On July 19, the clustering in eastern Queens was shown to be likely due to WNV through lab testing. 48

49 Example 1.4 Early warning system for West Nile virus Spatial Statistics Scan on 2001 data. - During the next 2-3 months, six areas with major dead bird clustering were identified. Five of seven diagnosed human cases in 2001 were identified in residents of four cluster areas. 49

50 Example 1.4 Early warning system for West Nile virus Spatial Statistics Scan on 2001 data. - In summary: dead bird clusters occurred 0 40 days (median 12 days) before the onset of human illness, and days (median 17 days) before human diagnosis. In most cases, dead bird clusters also preceded time of collection of WNV-positive mosquitoes and birds. 50

51 Time-space analysis: Knox statistic Knox test is a test of space-time interaction based on a count of the number of event pairs that occur within a pre-specified critical interval of time t, and a pre-specified critical distance s. With n points located in space and time, there are n(n-1)/2 distant pairs of points. Let n s be the observed number of pairs that are close together in space (i.e., pairs that are separated by a distance less than s); and let n t be the observed number of pairs that are close together in time (i.e., pairs that are separated in time by less than t). the test statistic is n st, the observed number of pairs that are close in both space and time. 51

52 Time-space analysis: Knox statistic Knox suggested that the random variable N st, which takes the observed values n st, was approximately Poisson distributed: 52

53 Time-space analysis: Knox statistic When test statistic n st exceeds its expectation E(N st ), it implies that when points that are close together in space are closer than expected in time, or alternatively states, when points that are close together in time are closer than expected in space. This is then an indication of something in clusters is emerging because an ove-density on time or space is observed. To carry out a Knox's test, we may have the following alternatives: (a)assume that N st follows a Poisson distribution with = E(N st ), then test the potential change in the observed, which is related to n st ; (b)use a normal approximation to the Poisson distribution, with both its mean and variance equal to E(N st ); (c) Use a normal approximation, let its mean = E(N st ) and variance = var(n st ). We will use (c) here in this class. 53

54 Time-space analysis: Knox statistic A local Knox statistic. The original Knox statistic provides valuable information regarding whether a significant space-time interaction exists in the data. But the statistical importance of particular observations remains unknown solely from Knox statistic. Oftentimes, it is also of interest to find those pairs of observations that are close in both space and time. This is what the "local" Knox statistic is proposed for. the original Knox statistic is a "global" statistic, in this sense. Let n s (i) be the number of observations that are close to observation i in space, n t (i) be the number that are close in time, and n st (i) the number that are close together in both space and time. Define the observed value of the local Knox statistic N st (i) to be n st (i), and also people prove that i.e., the local statistics sum to a constant multiple of the global statistic. 54

55 Time-space analysis: Knox statistic Let j = 1,, n be the ordered index of the time value assigned to location i, where for example, j = 1 implies that i has been assigned the time associated with the first observations, and j = n implies that i has been assigned the time associated with the n th observation. 55

56 Time-space analysis: Knox statistic An approximate significance test for n st (i) may be constructed by using a z- score: Sequence of use: (1)The global Knox statistic Nst can be monitored by a CUSUM or a Shewhart chart; (2)When there is a significant global Knox statistic, the local Knox statistic can be useful in finding the observations that contribute to the global significance. 56

57 Time-space analysis: Knox statistic Interpretation of a significant Knox statistic: It implies a strong space-time interaction. This could be an indication that a disease is infectious or that it is caused by some other type of agent appears locally at specific times such as food poisoning. In criminology, a strong space-time pattern of homicides is an indication of a serial-killer. In the past, many of the medical studies utilizing the Knox statistic were concerned with leukemia. Because there was strong space-time interactions, the studies were used as an evidence supporting that certain type of leukemia may be caused by viral infection. Up to this day, there has been no settled answer in the medical community but viral etiology of the disease is still one valid hypothesis. 57

58 Time-space analysis: Knox statistic Example 4.15: Application of Knox statistic to Burkitt's lymphoma in Uganda. Data: locations and dates of onset and diagnosis for cases of Burkitt's lymphoma during The study region was the West Nile district of Uganda. There are 188 cases for which data exist on both the location and the date of onset. 58

59 Time-space analysis: Knox statistic (Example 4.15) Monitoring using the Knox statistic was carried out for the following combinations of critical time windows and spatial distances (i.e., t and s): t = 30, 60, 90, 120, 180, 360 days s = 2.5, 5, 10, 20, 40, km There are a total of 30 = 6 5 combinations. A CUSUM chart is used to detect the chance in the Knox statistic: 59

60 Time-space analysis: Knox statistic (Example 4.15) Results: Table 3 summarizes the monitoring result. Significant space-time interactions are found at critical space and time units. This also prompts people to think lymphoma may be caused by certain viral infection. 60

61 Time-space analysis: Knox statistic (Example 4.15) Results: Figure 2 provides an example of how the CUSM may be tracked over time, for the case of s = 20 km and t = 180 days. 61

62 Time-space analysis: Knox statistic (Example 4.15) Results: In the figure, three space-time clustering signals are indicated: - A brief signal is sent at observation 25, in Dec 1964; - A second signal is noted at observation 82, in Feb 1969; - The most persistent signal begins at observation 146, in Jan This clustering remains until observation 172, in Jan An examination reveals that observations were the primary contributors to this signal. Those are highlighted in Figure 1. 62

An Introduction to SaTScan

An Introduction to SaTScan Software to measure spatial, temporal or space-time clusters using a spatial scan approach Marilyn O Hara University of Illinois moruiz@illinois.edu Lecture for the Pre-conference