*****************************************************************************
* CHAPTER 14 - STATISTICS FOR BUSINESS & ECONOMICS, 5th Edition             *
*****************************************************************************
* Example 14.1, p. 560
*
* The SAMPLE command is used to specify the sample range of the data to be
* read.  The READ command is used to read in a gas company's findings of a
* random sample of 400 of its accounts which is defined as O and the
* corresponding probability of the accounts being fully paid or in arrears
* is defined as PROB.  The LIST option on the READ command lists all data
* read.
*
SAMPLE 1 4
READ OB PROB / LIST
287  0.80 
 49  0.10 
 30  0.06 
 34  0.04
*
* The GEN1 command is used to calculate the expected number of the present
* accounts to fall in each of the four categories P is defined as EXPECT.
*
GENR EXPECT=400*PROB
*
* The STAT command with the SUM= option is used to save the sum of the
* variable in a specified vector.
*
STAT OB / SUM=SO
STAT PROB / SUM=SP
STAT EXPECT / SUM=SEXPECT
*
* The PRINT command is used to replicate the Table on Page 561.
*
PRINT OB PROB EXPECT
PRINT SO SP SEXPECT
*
* The Null Hypothesis is that the proportion in the present winter conform to
* the historical records.   The GENR and STAT commands are used to generate
* the Chi-squared test value.
* 
GENR CHI=(OB-EXPECT)**2/EXPECT
STAT CHI / SUM=CHI2
PRINT CHI2
*
* The DISTRIB command provides functions of probability distributions.  The
* format of the DISTRIB command is:
*
*    DISTRIB vars / options
*
* where:  vars    = list of variables
*         options = list of options that are required on the specified
*                   type of distribution
*
* In this example, the Chi-squared test statistic is printed with the
* DISTRIB command.  The TYPE= option defines the type of distribution,
* DF= defines the degress of freedom, INVERSE computes the survival function.
*
GEN1 PVAL=0.005
DISTRIB PVAL / TYPE=CHI DF=3 INVERSE LIST
*
*-----------------------------------------------------------------------------
* Example 14.2, p. 563
*
* In this example, the Number of Occurrences is defined as X and the Observed
* Frequencies is defined as OF.
*
SAMPLE 1 4
READ X OF
0 156
1  63
2  29
3  14
*
GEN1 N=262
*
* The TYPE= option on the DISTRIB command in this example estimates the 
* Poisson distribution with a MEAN=0.66, the regression's Probability Density
* Function is saved with the PDF=P and the Cumulative Distribution Function
* is saved with the CDF=CF options.
*
DISTRIB X / TYPE=POISSON MEAN=0.66 LIST PDF=P CDF=CF
*
* The GENR command is then used to generate a vector of the Expected
* Frequencies, EF.  The SUM function on the GENR command is used to create
* a vector called EFC with the cumulative sum of EF row by row.  
*
GENR EF=P*N
GENR EFC=SUM(EF)
*
* The Expected Frequencies, EF, of 3 or More Occurrences is calculated with
* the SAMPLE and GENR commands below.  Since there is only one value of EF
* that needs to be calculated, the SAMPLE command is then set to 4 4.
*
SAMPLE 4 4
GENR EF=N-EFC(3)
*
* The SAMPLE command is restored to all 4 observations in order to calculate
* the test statistic CHISQ.
*
SAMPLE 1 4
GENR CHI=(OF-EF)**2/EF
STAT CHI / SUMS=CHISQ
PRINT X OF EF CHI
PRINT CHISQ
*
* The DISTRIB command is then used to calculate the critical value of the
* Chi-square distribution.
*
GEN1 PVAL=0.005
DISTRIB PVAL / TYPE=CHI DF=2 INVERSE LIST
*
*----------------------------------------------------------------------------
* Example 14.3, p. 566
*
GEN1 N=300
GEN1 SKEW=0.0305
GEN1 KURT=0.08
GEN1 B=N*(SKEW**2/6+KURT**2/24)
PRINT B
*
*----------------------------------------------------------------------------
* Example 14.4, p. 568
*
SAMPLE 1 3
READ FEMALE MALE / LIST
256 74 
 41 42 
 66 34
*
* The GENR and STAT command are used to calculate the total of the FEMALE and
* MALE, FEMALE, MALE in each category.  The respective sums are saved in the
* specified constant.
*
GENR TOTAL=FEMALE+MALE
STAT FEMALE / SUMS=SUMF
STAT MALE / SUMS=SUMM
STAT TOTAL / SUMS=SUMT
*
* The PRINT command is used to replicate Table 14.9.
*
PRINT FEMALE MALE TOTAL
PRINT SUMF SUMM SUMT
*
* Table 14.10 is replicated with the use of the GENR and PRINT commands.
*
GENR E1=(TOTAL*SUMF)/SUMT
GENR E2=(TOTAL*SUMM)/SUMT
PRINT FEMALE E1 MALE E2 TOTAL
PRINT SUMF SUMM SUMT
*
* The Null Hypothesis that no association is tested with the GENR and STAT
* commands.
*
GENR CHI=(FEMALE-E1)**2/E1+(MALE-E2)**2/E2
STAT CHI / SUMS=CHI2
PRINT CHI2
GEN1 XVALUE=0.005
DISTRIB XVALUE / TYPE=CHI DF=2 INVERSE
*
*----------------------------------------------------------------------------
* Example 14.5, p. 570
*
SAMPLE 1 3
READ F1040 OTHER EXTEN / LIST
35   8   7  50
60  20  10  90
45  11   4  60
*
* The Observed Frequencies
*
GENR TOTAL=F1040+OTHER+EXTEN
STAT F1040 / SUMS=SF1040
STAT OTHER / SUMS=SOTHER
STAT EXTEN / SUMS=SEXT
STAT TOTAL / SUMS=STOTAL
PRINT F1040 OTHER EXTEN TOTAL
PRINT SF1040 SOTHER SEXT STOTAL
*
* The Expected Frequencies
*
GENR EF1040=(TOTAL*SF1040)/STOTAL
GENR EOTHER=(TOTAL*SOTHER)/STOTAL
GENR EEXT=(TOTAL*SEXT)/STOTAL
PRINT EF1040 EOTHER EEXT TOTAL
PRINT SF1040 SOTHER SEXT STOTAL
*
* The & allows a SHAZAM command to be continued onto the following line.
*
GENR CHI=(F1040-EF1040)**2/EF1040+(OTHER-EOTHER)**2/EOTHER+(EXTEN-EEXT)**2/&
EEXT
STAT CHI / SUMS=CHI2
PRINT CHI2
*
* The DISTRIB command is used to print the P-Value printed in the textbook.
* In the SHAZAM output, the printed P-Value corresponding to the textbook
* is defined as 1-CDF.
*
GEN1 LS=0.05
DISTRIB LS / TYPE=CHI DF=4 INVERSE 
*
*----------------------------------------------------------------------------
* Example 14.6, p. 570
*
* The SHAZAM output for this example is the reverse of Figure 14.2.
*
* The SKIPLINES= option on the READ command instructs SHAZAM to skip 1 line
* before the data is read.
*
SAMPLE 1 355
READ(library.txt) STUDID CLASS DAYSW HRET EASY ADEQUATE HRSEXT MOREU / &
SKIPLINES=1
*
* The SET MISSVALU= command is used to assign the missing data value to 9
* in lieu of the default value of -99999.
*
SET MISSVALU=9
*
* To replicate the cross-classification data illustrated in Figure 14.2, the
* data will be examined based on classification of the variable CLASS.  Once
* the data has been sorted, then the data will be divided into NO and YES
* responses.
*
* The first group of answers to be examined will be the Freshman where
* CLASS=1 regardless whether they answered NO or YES to the question.  The
* SET command with the NOWARNSKIP option tells SHAZAM to turn off the
* warning message for every observation skipped with the SKIPIF statement.
* The format of the SET command is:
*
*    SET option
*
* where:  option = option to be turned on
*
*    SET NOoption
*
* where:  option = optiion to be turned off
*
SET NOWARNSKIP
************************************************
* CLASS=1 (Freshman) and ADEQUATE=1 or 2       *
************************************************
*
* The SKIPIF command is used to skip all the data where CLASS is not equal 1
* so the remaining data will be the NO and YES answers for the Freshman. The
* format of the SKIPIF command is:
*
*    SKIPIF(expression)
*
* where:  expression = arithmetic or logical operators
*

SKIPIF(CLASS.NE.1)
STAT ADEQUATE / PFREQ
*
* The GEN1 command with $N saves the number of observations where the
* previous SKIPIF condition is true.  This value will be used later in the
* cross-classification table.
*
GEN1 ALL1=$N
*
* CLASS=1 and ADEQUATE=2 (No)
* The SKIPIF(ADEQUATE.NE.2) statement now looks at all the Freshman (CLASS=1)
* who answered NO to the question.
*
SKIPIF(ADEQUATE.NE.2)
STAT ADEQUATE / PFREQ
GEN1 NO1=$N
*
* To permanently eliminate all SKIPIF commands in effect, the DELETE SKIP$
* command is used.  This restores the data to the original 355 student
* responses.
*
DELETE SKIP$
*
* The responses by the Freshman, CLASS=1, is sorted out again from the
* original 355 sample set so the Freshman responses of YES to the question
* can be examined.
*
SKIPIF(CLASS.NE.1)
STAT ADEQUATE / PFREQ
*
* CLASS=1 and ADEQUATE=1 (Yes)
*
SKIPIF(ADEQUATE.NE.1)
STAT ADEQUATE / PFREQ
GEN1 YES1=$N
*
PRINT YES1 NO1 ALL1 
*
* Restore Sample Range
*
DELETE SKIP$
*
************************************************
* CLASS=2 (Sophomore) and ADEQUATE=1 or 2      *
************************************************
*
* The above procedure used to sort out the responses for the Freshman,
* CLASS=1, is used to sort out the Sophomore, CLASS=2.
*
SKIPIF(CLASS.NE.2)
STAT ADEQUATE / PFREQ
GEN1 ALL2=$N
*
* CLASS=2 and ADEQUATE=2 (No)
*
SKIPIF(ADEQUATE.NE.2)
STAT ADEQUATE / PFREQ
GEN1 NO2=$N
*
* Restore the entire sample range
*
DELETE SKIP$
*
* CLASS=2 and ADEQUATE=1 (Yes)
*
SKIPIF(CLASS.NE.2)
STAT ADEQUATE / PFREQ
SKIPIF(ADEQUATE.NE.1)
STAT ADEQUATE / PFREQ
GEN1 YES2=$N
*
PRINT YES2 NO2 ALL2 
*
* Restore Sample Range
*
DELETE SKIP$
*
***********************************************
* CLASS=3 (Junior) and ADEQUATE=1 or 2        *
***********************************************
*
* The above procedure used to sort out the responses for the Freshman,
* CLASS=1, is used to sort out the Junior, CLASS=3.
*
SKIPIF(CLASS.NE.3)
STAT ADEQUATE / PFREQ 
GEN1 ALL3=$N
*
* CLASS=3 and ADEQUATE=2 (No)
*
SKIPIF(ADEQUATE.NE.2)
STAT ADEQUATE / PFREQ
GEN1 NO3=$N
*
* Restore the entire sample range
*
DELETE SKIP$
SKIPIF(CLASS.NE.3)
STAT ADEQUATE / PFREQ
*
* CLASS=3 and ADEQUATE=1 (Yes)
*
SKIPIF(ADEQUATE.NE.1)
STAT ADEQUATE / PFREQ
GEN1 YES3=$N
*
PRINT YES3 NO3 ALL3 
*
* Restore Sample Range
*
DELETE SKIP$
*
***********************************************
* CLASS=4 (Senior) and ADEQUATE=1 or 2        *
***********************************************
*
* The above procedure used to sort out the responses for the Freshman,
* CLASS=1, is used to sort out the Senior, CLASS=4.
*
SKIPIF(CLASS.NE.4)
STAT ADEQUATE / PFREQ  
GEN1 ALL4=$N
*
* CLASS=4 and ADEQUATE=2 (No)
*
SKIPIF(ADEQUATE.NE.2)
STAT ADEQUATE / PFREQ
GEN1 NO4=$N
*
* Restore the entire sample range
*
DELETE SKIP$
SKIPIF(CLASS.NE.4)
STAT ADEQUATE / PFREQ
*
* CLASS=4 and ADEQUATE=1 (Yes)
*
SKIPIF(ADEQUATE.NE.1)
STAT ADEQUATE / PFREQ
GEN1 YES4=$N
*
PRINT YES4 NO4 ALL4 
*
* Restore Sample Range
*
DELETE SKIP$
*
* The first GEN1 command sums the total NO answers for all 4 classes.  The
* second GEN1 command sums the total YES answers for all 4 classes.  The
* third GEN1 command sums the NO and YES answers by Class.
*
GEN1 TYES=YES1+YES2+YES3+YES4
GEN1 TNO=NO1+NO2+NO3+NO4
GEN1 ALL=ALL1+ALL2+ALL3+ALL4
PRINT YES1 YES2 YES3 YES4 TYES
PRINT NO1 NO2 NO3 NO4 TNO
PRINT ALL1 ALL2 ALL3 ALL4 ALL
*
* Expected frequencies are calculated with the GEN1 commands below.
*
GEN1 EYES1=(ALL1*TYES)/ALL
GEN1 EYES2=(ALL2*TYES)/ALL
GEN1 EYES3=(ALL3*TYES)/ALL
GEN1 EYES4=(ALL4*TYES)/ALL
GEN1 ENO1=(ALL1*TNO)/ALL
GEN1 ENO2=(ALL2*TNO)/ALL
GEN1 ENO3=(ALL3*TNO)/ALL
GEN1 ENO4=(ALL4*TNO)/ALL
*
PRINT YES1 EYES1 YES2 EYES2 YES3 EYES3 YES4 EYES4 TYES
PRINT NO1 ENO1 NO2 ENO2 NO3 ENO3 NO4 ENO4 TNO
PRINT ALL1 ALL2 ALL3 ALL4 ALL
*
* Finally, the Chi-Square is calculated below.  The Chi-Square statistic
* is not identical with the textbook since the data for the Class=1 (Freshman)
* and ADEQUATE=1 or 2 does not match that of the textbook.
*
GEN1 CHI=(NO1-ENO1)**2/ENO1+(NO2-ENO2)**2/ENO2+(NO3-ENO3)**2/ENO3+(NO4-ENO4)&
**2/ENO4+(YES1-EYES1)**2/EYES1+(YES2-EYES2)**2/EYES2+(YES3-EYES3)**2/EYES2&
+(YES4-EYES4)**2/EYES4
STAT CHI / SUMS=CHI2
PRINT CHI2
*
*-----------------------------------------------------------------------------
STOP