***************************************************************************** * CHAPTER 14 - STATISTICS FOR BUSINESS & ECONOMICS, 5th Edition * ***************************************************************************** * Example 14.1, p. 560 * * The SAMPLE command is used to specify the sample range of the data to be * read. The READ command is used to read in a gas company's findings of a * random sample of 400 of its accounts which is defined as O and the * corresponding probability of the accounts being fully paid or in arrears * is defined as PROB. The LIST option on the READ command lists all data * read. * SAMPLE 1 4 READ OB PROB / LIST 287 0.80 49 0.10 30 0.06 34 0.04 * * The GEN1 command is used to calculate the expected number of the present * accounts to fall in each of the four categories P is defined as EXPECT. * GENR EXPECT=400*PROB * * The STAT command with the SUM= option is used to save the sum of the * variable in a specified vector. * STAT OB / SUM=SO STAT PROB / SUM=SP STAT EXPECT / SUM=SEXPECT * * The PRINT command is used to replicate the Table on Page 561. * PRINT OB PROB EXPECT PRINT SO SP SEXPECT * * The Null Hypothesis is that the proportion in the present winter conform to * the historical records. The GENR and STAT commands are used to generate * the Chi-squared test value. * GENR CHI=(OB-EXPECT)**2/EXPECT STAT CHI / SUM=CHI2 PRINT CHI2 * * The DISTRIB command provides functions of probability distributions. The * format of the DISTRIB command is: * * DISTRIB vars / options * * where: vars = list of variables * options = list of options that are required on the specified * type of distribution * * In this example, the Chi-squared test statistic is printed with the * DISTRIB command. The TYPE= option defines the type of distribution, * DF= defines the degress of freedom, INVERSE computes the survival function. * GEN1 PVAL=0.005 DISTRIB PVAL / TYPE=CHI DF=3 INVERSE LIST * *----------------------------------------------------------------------------- * Example 14.2, p. 563 * * In this example, the Number of Occurrences is defined as X and the Observed * Frequencies is defined as OF. * SAMPLE 1 4 READ X OF 0 156 1 63 2 29 3 14 * GEN1 N=262 * * The TYPE= option on the DISTRIB command in this example estimates the * Poisson distribution with a MEAN=0.66, the regression's Probability Density * Function is saved with the PDF=P and the Cumulative Distribution Function * is saved with the CDF=CF options. * DISTRIB X / TYPE=POISSON MEAN=0.66 LIST PDF=P CDF=CF * * The GENR command is then used to generate a vector of the Expected * Frequencies, EF. The SUM function on the GENR command is used to create * a vector called EFC with the cumulative sum of EF row by row. * GENR EF=P*N GENR EFC=SUM(EF) * * The Expected Frequencies, EF, of 3 or More Occurrences is calculated with * the SAMPLE and GENR commands below. Since there is only one value of EF * that needs to be calculated, the SAMPLE command is then set to 4 4. * SAMPLE 4 4 GENR EF=N-EFC(3) * * The SAMPLE command is restored to all 4 observations in order to calculate * the test statistic CHISQ. * SAMPLE 1 4 GENR CHI=(OF-EF)**2/EF STAT CHI / SUMS=CHISQ PRINT X OF EF CHI PRINT CHISQ * * The DISTRIB command is then used to calculate the critical value of the * Chi-square distribution. * GEN1 PVAL=0.005 DISTRIB PVAL / TYPE=CHI DF=2 INVERSE LIST * *---------------------------------------------------------------------------- * Example 14.3, p. 566 * GEN1 N=300 GEN1 SKEW=0.0305 GEN1 KURT=0.08 GEN1 B=N*(SKEW**2/6+KURT**2/24) PRINT B * *---------------------------------------------------------------------------- * Example 14.4, p. 568 * SAMPLE 1 3 READ FEMALE MALE / LIST 256 74 41 42 66 34 * * The GENR and STAT command are used to calculate the total of the FEMALE and * MALE, FEMALE, MALE in each category. The respective sums are saved in the * specified constant. * GENR TOTAL=FEMALE+MALE STAT FEMALE / SUMS=SUMF STAT MALE / SUMS=SUMM STAT TOTAL / SUMS=SUMT * * The PRINT command is used to replicate Table 14.9. * PRINT FEMALE MALE TOTAL PRINT SUMF SUMM SUMT * * Table 14.10 is replicated with the use of the GENR and PRINT commands. * GENR E1=(TOTAL*SUMF)/SUMT GENR E2=(TOTAL*SUMM)/SUMT PRINT FEMALE E1 MALE E2 TOTAL PRINT SUMF SUMM SUMT * * The Null Hypothesis that no association is tested with the GENR and STAT * commands. * GENR CHI=(FEMALE-E1)**2/E1+(MALE-E2)**2/E2 STAT CHI / SUMS=CHI2 PRINT CHI2 GEN1 XVALUE=0.005 DISTRIB XVALUE / TYPE=CHI DF=2 INVERSE * *---------------------------------------------------------------------------- * Example 14.5, p. 570 * SAMPLE 1 3 READ F1040 OTHER EXTEN / LIST 35 8 7 50 60 20 10 90 45 11 4 60 * * The Observed Frequencies * GENR TOTAL=F1040+OTHER+EXTEN STAT F1040 / SUMS=SF1040 STAT OTHER / SUMS=SOTHER STAT EXTEN / SUMS=SEXT STAT TOTAL / SUMS=STOTAL PRINT F1040 OTHER EXTEN TOTAL PRINT SF1040 SOTHER SEXT STOTAL * * The Expected Frequencies * GENR EF1040=(TOTAL*SF1040)/STOTAL GENR EOTHER=(TOTAL*SOTHER)/STOTAL GENR EEXT=(TOTAL*SEXT)/STOTAL PRINT EF1040 EOTHER EEXT TOTAL PRINT SF1040 SOTHER SEXT STOTAL * * The & allows a SHAZAM command to be continued onto the following line. * GENR CHI=(F1040-EF1040)**2/EF1040+(OTHER-EOTHER)**2/EOTHER+(EXTEN-EEXT)**2/& EEXT STAT CHI / SUMS=CHI2 PRINT CHI2 * * The DISTRIB command is used to print the P-Value printed in the textbook. * In the SHAZAM output, the printed P-Value corresponding to the textbook * is defined as 1-CDF. * GEN1 LS=0.05 DISTRIB LS / TYPE=CHI DF=4 INVERSE * *---------------------------------------------------------------------------- * Example 14.6, p. 570 * * The SHAZAM output for this example is the reverse of Figure 14.2. * * The SKIPLINES= option on the READ command instructs SHAZAM to skip 1 line * before the data is read. * SAMPLE 1 355 READ(library.txt) STUDID CLASS DAYSW HRET EASY ADEQUATE HRSEXT MOREU / & SKIPLINES=1 * * The SET MISSVALU= command is used to assign the missing data value to 9 * in lieu of the default value of -99999. * SET MISSVALU=9 * * To replicate the cross-classification data illustrated in Figure 14.2, the * data will be examined based on classification of the variable CLASS. Once * the data has been sorted, then the data will be divided into NO and YES * responses. * * The first group of answers to be examined will be the Freshman where * CLASS=1 regardless whether they answered NO or YES to the question. The * SET command with the NOWARNSKIP option tells SHAZAM to turn off the * warning message for every observation skipped with the SKIPIF statement. * The format of the SET command is: * * SET option * * where: option = option to be turned on * * SET NOoption * * where: option = optiion to be turned off * SET NOWARNSKIP ************************************************ * CLASS=1 (Freshman) and ADEQUATE=1 or 2 * ************************************************ * * The SKIPIF command is used to skip all the data where CLASS is not equal 1 * so the remaining data will be the NO and YES answers for the Freshman. The * format of the SKIPIF command is: * * SKIPIF(expression) * * where: expression = arithmetic or logical operators * SKIPIF(CLASS.NE.1) STAT ADEQUATE / PFREQ * * The GEN1 command with $N saves the number of observations where the * previous SKIPIF condition is true. This value will be used later in the * cross-classification table. * GEN1 ALL1=$N * * CLASS=1 and ADEQUATE=2 (No) * The SKIPIF(ADEQUATE.NE.2) statement now looks at all the Freshman (CLASS=1) * who answered NO to the question. * SKIPIF(ADEQUATE.NE.2) STAT ADEQUATE / PFREQ GEN1 NO1=$N * * To permanently eliminate all SKIPIF commands in effect, the DELETE SKIP$ * command is used. This restores the data to the original 355 student * responses. * DELETE SKIP$ * * The responses by the Freshman, CLASS=1, is sorted out again from the * original 355 sample set so the Freshman responses of YES to the question * can be examined. * SKIPIF(CLASS.NE.1) STAT ADEQUATE / PFREQ * * CLASS=1 and ADEQUATE=1 (Yes) * SKIPIF(ADEQUATE.NE.1) STAT ADEQUATE / PFREQ GEN1 YES1=$N * PRINT YES1 NO1 ALL1 * * Restore Sample Range * DELETE SKIP$ * ************************************************ * CLASS=2 (Sophomore) and ADEQUATE=1 or 2 * ************************************************ * * The above procedure used to sort out the responses for the Freshman, * CLASS=1, is used to sort out the Sophomore, CLASS=2. * SKIPIF(CLASS.NE.2) STAT ADEQUATE / PFREQ GEN1 ALL2=$N * * CLASS=2 and ADEQUATE=2 (No) * SKIPIF(ADEQUATE.NE.2) STAT ADEQUATE / PFREQ GEN1 NO2=$N * * Restore the entire sample range * DELETE SKIP$ * * CLASS=2 and ADEQUATE=1 (Yes) * SKIPIF(CLASS.NE.2) STAT ADEQUATE / PFREQ SKIPIF(ADEQUATE.NE.1) STAT ADEQUATE / PFREQ GEN1 YES2=$N * PRINT YES2 NO2 ALL2 * * Restore Sample Range * DELETE SKIP$ * *********************************************** * CLASS=3 (Junior) and ADEQUATE=1 or 2 * *********************************************** * * The above procedure used to sort out the responses for the Freshman, * CLASS=1, is used to sort out the Junior, CLASS=3. * SKIPIF(CLASS.NE.3) STAT ADEQUATE / PFREQ GEN1 ALL3=$N * * CLASS=3 and ADEQUATE=2 (No) * SKIPIF(ADEQUATE.NE.2) STAT ADEQUATE / PFREQ GEN1 NO3=$N * * Restore the entire sample range * DELETE SKIP$ SKIPIF(CLASS.NE.3) STAT ADEQUATE / PFREQ * * CLASS=3 and ADEQUATE=1 (Yes) * SKIPIF(ADEQUATE.NE.1) STAT ADEQUATE / PFREQ GEN1 YES3=$N * PRINT YES3 NO3 ALL3 * * Restore Sample Range * DELETE SKIP$ * *********************************************** * CLASS=4 (Senior) and ADEQUATE=1 or 2 * *********************************************** * * The above procedure used to sort out the responses for the Freshman, * CLASS=1, is used to sort out the Senior, CLASS=4. * SKIPIF(CLASS.NE.4) STAT ADEQUATE / PFREQ GEN1 ALL4=$N * * CLASS=4 and ADEQUATE=2 (No) * SKIPIF(ADEQUATE.NE.2) STAT ADEQUATE / PFREQ GEN1 NO4=$N * * Restore the entire sample range * DELETE SKIP$ SKIPIF(CLASS.NE.4) STAT ADEQUATE / PFREQ * * CLASS=4 and ADEQUATE=1 (Yes) * SKIPIF(ADEQUATE.NE.1) STAT ADEQUATE / PFREQ GEN1 YES4=$N * PRINT YES4 NO4 ALL4 * * Restore Sample Range * DELETE SKIP$ * * The first GEN1 command sums the total NO answers for all 4 classes. The * second GEN1 command sums the total YES answers for all 4 classes. The * third GEN1 command sums the NO and YES answers by Class. * GEN1 TYES=YES1+YES2+YES3+YES4 GEN1 TNO=NO1+NO2+NO3+NO4 GEN1 ALL=ALL1+ALL2+ALL3+ALL4 PRINT YES1 YES2 YES3 YES4 TYES PRINT NO1 NO2 NO3 NO4 TNO PRINT ALL1 ALL2 ALL3 ALL4 ALL * * Expected frequencies are calculated with the GEN1 commands below. * GEN1 EYES1=(ALL1*TYES)/ALL GEN1 EYES2=(ALL2*TYES)/ALL GEN1 EYES3=(ALL3*TYES)/ALL GEN1 EYES4=(ALL4*TYES)/ALL GEN1 ENO1=(ALL1*TNO)/ALL GEN1 ENO2=(ALL2*TNO)/ALL GEN1 ENO3=(ALL3*TNO)/ALL GEN1 ENO4=(ALL4*TNO)/ALL * PRINT YES1 EYES1 YES2 EYES2 YES3 EYES3 YES4 EYES4 TYES PRINT NO1 ENO1 NO2 ENO2 NO3 ENO3 NO4 ENO4 TNO PRINT ALL1 ALL2 ALL3 ALL4 ALL * * Finally, the Chi-Square is calculated below. The Chi-Square statistic * is not identical with the textbook since the data for the Class=1 (Freshman) * and ADEQUATE=1 or 2 does not match that of the textbook. * GEN1 CHI=(NO1-ENO1)**2/ENO1+(NO2-ENO2)**2/ENO2+(NO3-ENO3)**2/ENO3+(NO4-ENO4)& **2/ENO4+(YES1-EYES1)**2/EYES1+(YES2-EYES2)**2/EYES2+(YES3-EYES3)**2/EYES2& +(YES4-EYES4)**2/EYES4 STAT CHI / SUMS=CHI2 PRINT CHI2 * *----------------------------------------------------------------------------- STOP