This tutorial discusses simple summary measures and how the level of measurement of a variable influences the types of statistics that should be used. We will use the data file demo.sav.
Level of Measurement
=================================
1) Different summary measures are appropriate for different types of data, depending on the level of measurement; Categorical
2) Categorical.
Data with a limited number of distinct values or categories (for example, gender or marital status). Also referred to as qualitative data. Categorical variables can be string (alphanumeric) data or numeric variables that use numeric codes to represent categories (for example, 0 = Unmarried and 1 = Married). There are two basic types of categorical data:
3) Nominal. Categorical data where there is no inherent order to the categories. For example, a job category of sales is not higher or lower than a job category of marketing or research.
4) Ordinal. Categorical data where there is a meaningful order of categories, but there is not a measurable distance between categories. For example, there is an order to the values high, medium, and low, but the "distance" between the values cannot be calculated.
Summary Measures for Categorical Data
============================================================
1) For categorical data, the most typical summary measure is the number or percentage of cases in each category. The mode is the category with the greatest number of cases. For ordinal data, the median (the value at which half of the cases fall above and below) may also be a useful summary measure if there is a large number of categories.
2) The Frequencies procedure produces frequency tables that display both the number and percentage of cases for each observed value of a variable.
2) The Frequencies procedure produces frequency tables that display both the number and percentage of cases for each observed value of a variable.
► Select Owns PDA [ownpda] and Owns TV [owntv] and move them into the Variable(s) list.
► Click OK 
to run the procedure.
3) The frequency tables are displayed in the Viewer window. The 
frequency tables reveal that only 20.4% of the people own PDAs, but almost 
everybody owns a TV (99.0%). These might not be interesting revelations, 
although it might be interesting to find out more about the small group of 
people who do not own televisions.
4) You can graphically display the information in a frequency table with a bar 
chart or pie chart.
► Open the Frequencies dialog box again. (The two variables should still be selected.)
You can use the Dialog Recall button on the toolbar to quickly return to recently used procedures. 
► Click Charts.
► Select Bar 
charts and then click Continue.
► Click OK 
in the main dialog box to run the procedure.
5) In addition to the frequency tables, the same information is now 
displayed in the form of bar charts, making it easy to see that most people do 
not own PDAs but almost everyone owns a TV.
Summary Measures for Scale Variables
==========================================================
1) There are many summary measures available for scale variables, 
including:
•  Measures of central tendency. 
The most common measures of central tendency are the mean (arithmetic average) and median (value at which half the cases fall above and 
below).
•  Measures of dispersion. 
Statistics that measure the amount of variation or spread in the data include 
the standard deviation, minimum, and maximum.
► Open the Frequencies dialog box 
again.
► Click Reset to clear any previous settings.
► Select Household 
income in thousands [income] and move it into the Variable(s) 
list.
► Click Statistics.
► Select Mean, Median, Std. deviation, Minimum, and Maximum.
► Click Continue.
► Deselect Display frequency tables in the main dialog box. (Frequency 
tables are usually not useful for scale variables since there may be almost as 
many distinct values as there are cases in the data file.)
► Click OK 
to run the procedure.
2) The Frequencies Statistics table is displayed in the Viewer 
window.
In this example, there is a large difference between the mean and 
the median. The mean is almost 25,000 greater than the median, indicating that 
the values are not normally distributed. You can visually check the distribution 
with a histogram.
► Open the Frequencies dialog box 
again.
► Click Charts.
► Select Histograms and With normal 
curve.
► Click Continue, and then click OK in the 
main dialog box to run the procedure.
3) The majority of cases are clustered at the lower end of the scale, 
with most falling below 100,000. There are, however, a few cases in the 500,000 
range and beyond (too few to even be visible without modifying the histogram). 
These high values for only a few cases have a significant effect on the mean but 
little or no effect on the median, making the median a better indicator of 
central tendency in this example.

 
No comments:
Post a Comment