AGENDA Chi-square goodness of fit test GOODNESS OF FIT TESTS Used to determine if a sample could have come from a distribution with the specified parameters Commonly used to determine if data is normally distributed Many tests such as the ones that we have been using require normally distributed data. If data is not normally distributed, non-parametric tests must be used Also used for input distributions in system modeling Customers or jobs arrive exponentially distributed? Service times follow what distribution? Failures occur according to what distribution? GOODNESS OF FIT TESTS Based on a comparison of observations between Observed data Theoretical data The comparison utilizes a set of intervals or cells Each cell has a lower and upper boundary values The determination of the boundaries are a function of Theoretical distribution Number of observations in the sample 2 different approaches… TWO DIFFERENT APPROACHES Approach 1 Used in the book Equal interval approach No cell grouping can have less than 5 expected observations Approach 2 Used in other books Equiprobable approach More statistically robust HYPOTHESES TEST PROCEDURE Identify Ho and Ha Determine level of significance (generally 0.05 or 0.01) Determine “critical value” criterion from level of significance Calculate “test statistic” Make decision Fail to reject Ho Reject Ho HYPOTHESES Ho The sample could have come from a distribution with the specified parameters Ha The sample could not have come from a distribution with the specified parameters CRITICAL VALUE Chi-square distribution chart One sided test Alpha typically 0.05 Degrees of freedom Number of cells-number of parameters-1 CHI-SQUARE for a particular number of degrees of freedom TEST STATISTIC DECISION Cannot reject Test statistic is less than the critical value Sample could have come from a distribution with the specified parameters Reject Test statistic is greater than the critical value Sample could not have come from a distribution with the specified parameters EXAMPLE 1 EQUAL INTERVAL APPROACH 400 5 minute intervals were observed for air traffic control messages At alpha=0.01, is the distribution of the number of messages able to be considered as having a poisson distribution with a mean of 4.6? Approach Use the poisson table probability table for 4.6 Extract the probability from the cumulative probabilities Multiply the probability by 400 to obtain the expected observations Compare the actual observations to the expected observations HYPOTHESES Ho: Poisson distribution with mean of 4.6 Ha: Not poisson distribution with a mean of 4.6 CHI-SQUARE for 10-1 degrees of freedom TEST STATISTIC DECISION Test statistic of 6.749 is less than the critical value of 16.919 Cannot reject Ho of distribution being poisson with a mean of 4.6 EXAMPLE 2 EQUIPROBABLE APPROACH Were the scores from the last exam normally distributed? Sample statistics Mean=71.95 Std=11.93 N=43 HYPOTHESES Ho The sample could have come from a normally distributed population with a mean of 71.95 and a std of 11.93 Ha The sample could not have come from a normally distributed population with a mean of 71.95 and a std of 11.93 CRITICAL VALUE Chi-square distribution chart One sided test 0.05 Degrees of freedom The sample size is 43 Want the maximum number of cells not to exceed 100 with a minimum expected number of observation of 5 43/5=8.6 cells With 8 cells, the expected number of observations is 5.375 Degrees of freedom is number of cells – number of parameters used -1 Degrees of freedom=8-2-1=5 CHI-SQUARE for 5 of degrees of freedom TEST STATISTIC CELL BOUNDARIES To calculate observed values in each cell, we must determine the actual x cell boundaries from the 8 equiprobable cells Look up z value corresponding to probability Boundaries =mean+std * Z CALCULATING OBSERVATIONS CALCULATING TEST STATISTIC DECISION 2.581 < 11.070 Cannot reject the Ho Evidence to support the claim that the test scores are normally distributed with a mean of 71.95 and std of 11.93