Labels

Learn the powerful enterprise adaptable database:

Getting Started With ADABAS & Natural

Tuesday, January 29, 2013

SPSS -Examining Summary Statistics for Individual Variables



This tutorial discusses simple summary measures and how the level of measurement of a variable influences the types of statistics that should be used. We will use the data file demo.sav.

  1. Level of Measurement
  2. Summary Measures for Categorical Data
  3. Summary Measures for Categorical Data


Level of Measurement

=================================



1) Different summary measures are appropriate for different types of data, depending on the level of measurement; Categorical

2) Categorical.
Data with a limited number of distinct values or categories (for example, gender or marital status). Also referred to as qualitative data. Categorical variables can be string (alphanumeric) data or numeric variables that use numeric codes to represent categories (for example, 0 = Unmarried and 1 = Married). There are two basic types of categorical data:


3)  Nominal. Categorical data where there is no inherent order to the categories. For example, a job category of sales is not higher or lower than a job category of marketing or research.


4) Ordinal. Categorical data where there is a meaningful order of categories, but there is not a measurable distance between categories. For example, there is an order to the values high, medium, and low, but the "distance" between the values cannot be calculated.

5) Scale. Data measured on an interval or ratio scale, where the data values indicate both the order of values and the distance between values. For example, a salary of $72,195 is higher than a salary of $52,398, and the distance between the two values is $19,797. Also referred to as quantitative or continuous data.


Summary Measures for Categorical Data

============================================================

1) For categorical data, the most typical summary measure is the number or percentage of cases in each category. The mode is the category with the greatest number of cases. For ordinal data, the median (the value at which half of the cases fall above and below) may also be a useful summary measure if there is a large number of categories.

2) The Frequencies procedure produces frequency tables that display both the number and percentage of cases for each observed value of a variable.
From the menus choose:
Note: This feature requires the Statistics Base option.




Select Owns PDA [ownpda] and Owns TV [owntv] and move them into the Variable(s) list.
Click OK to run the procedure.


3) The frequency tables are displayed in the Viewer window. The frequency tables reveal that only 20.4% of the people own PDAs, but almost everybody owns a TV (99.0%). These might not be interesting revelations, although it might be interesting to find out more about the small group of people who do not own televisions.


4) You can graphically display the information in a frequency table with a bar chart or pie chart.
 Open the Frequencies dialog box again. (The two variables should still be selected.)

You can use the Dialog Recall button on the toolbar to quickly return to recently used procedures. 


► Click Charts.
Select Bar charts and then click Continue.

Click OK in the main dialog box to run the procedure.



5) In addition to the frequency tables, the same information is now displayed in the form of bar charts, making it easy to see that most people do not own PDAs but almost everyone owns a TV.




Summary Measures for Scale Variables

==========================================================

1) There are many summary measures available for scale variables, including:
•  Measures of central tendency. The most common measures of central tendency are the mean (arithmetic average) and median (value at which half the cases fall above and below).
•  Measures of dispersion. Statistics that measure the amount of variation or spread in the data include the standard deviation, minimum, and maximum.

Open the Frequencies dialog box again.
Click Reset to clear any previous settings.
Select Household income in thousands [income] and move it into the Variable(s) list.
 Click Statistics.



Select Mean, Median, Std. deviation, Minimum, and Maximum.
Click Continue.

Deselect Display frequency tables in the main dialog box. (Frequency tables are usually not useful for scale variables since there may be almost as many distinct values as there are cases in the data file.)
Click OK to run the procedure.


2) The Frequencies Statistics table is displayed in the Viewer window.
In this example, there is a large difference between the mean and the median. The mean is almost 25,000 greater than the median, indicating that the values are not normally distributed. You can visually check the distribution with a histogram.

Open the Frequencies dialog box again.
Click Charts.


Select Histograms and With normal curve.
Click Continue, and then click OK in the main dialog box to run the procedure.




3) The majority of cases are clustered at the lower end of the scale, with most falling below 100,000. There are, however, a few cases in the 500,000 range and beyond (too few to even be visible without modifying the histogram). These high values for only a few cases have a significant effect on the mean but little or no effect on the median, making the median a better indicator of central tendency in this example.







No comments:

Post a Comment