From Wikipedia, the free encyclopedia.
Scaling is the measurement of a variable in such a way that it can be
expressed on a continuum. Rating your preference for a product from 1 to 10 is
an example of a scale.
With comparative scaling, the items are directly compared with each
other (example : Do you prefer Pepsi
or Coke?).
In noncomparative scaling each item is scaled independently of the
others (example : How do you feel about Coke?).
Composite measures
Indexes are similar to scales except multiple indicators of a
variable are combined into a single measure. The index of consumer confidence,
for example, is a combination of several measures of consumer attitudes. A typology
is similar to an index except the variable is measured at the nominal level.
Scaling, indexes, and typologies are all examples of composite measures.
Data types
The type of information collected can influence scale construction.
Different types of information are measured in different ways.
- Some data is measured at the nominal
level. That is, any numbers used are mere labels : they express
no mathematical properties. Examples are SKU inventory codes and UPC bar
codes.
- Some data is measured at the ordinal
level. Numbers indicate the relative position of items, but not the
magnitude of difference. An example is a preference ranking.
- Some data is measured at the interval
level. Numbers indicate the magnitude of difference between items, but
there is no absolute zero point. Examples are attitude scales and opinion
scales.
- Some data is measured at the ratio
level. Numbers indicate magnitude of difference and there is a fixed
zero point. Ratios can be calculated. Examples include: age, income,
price, costs, sales revenue, sales volume, and market share.
Scale construction decisions
~ What level of data is involved (nominal, ordinal, interval, or ratio)?
~ What will the results be used for?
~ Should you use a scale, index, or typology?
~ What types of statistical analysis would be useful?
~ Should you use a comparative scale or a noncomparative scale?
~ How many scale divisions or categories to use (1 to 10; 1 to 7; -3 to +3)?
~ Odd or even number of divisions - odd gives neutral center value; even
forces respondents to take a non-neutral position
~ The nature and descriptiveness of the scale labels?
~ The physical form or layout of the scale? (graphic, simple linear, verticle,
horizontal)
~ Forced versus optional response?
Comparative scaling techniques
- Paired comparison scaling - a respondent is presented with two
items at a time and asked to select one (example : Do you prefer
Pepsi or Coke?). This is an ordinal level technique.
- Rank-order scaling - a respondent is presented with several items
simultaneously and asked to rank them (example : Rate the following
advertisements from 1 to 10.). This is an ordinal level technique.
- Constant sum scaling - a respondent is given a constant sum of
money, script, credits, or points and asked to allocate these to various
items (example : If you had 100 Yen to spend on food products, how
much would you spend on product A, on product B, on product C, etc.). This
is an ordinal level technique.
- Bogardus social distance scaling - measures the degree to which a
person is willing to associate with a class or type of people. It asks how
willing the respondent is to make various associations. The results are
reduced to a single score on a scale. There are also non-comparative
versions of this scale.
- Q-Sort scaling - Up to 140 items are sorted into groups based a
rank-order procedure.
- Guttman
scaling - This is a procedure to determine whether a set of items
can be rank-ordered on an unidimensional scale. It utilizes the intensity
structure among several indicators of a given variable. Statements are
listed in order of importance. The rating is scaled by summing all
responses until the first negative response in the list.
Non-comparative scaling techniques
- Continuous
rating scale (also called the graphic rating scale) - respondents
rate items by placing a mark on a line. The line is usually labeled at
each end. There are sometimes a series of numbers, called scale points,
(say, from zero to 100) under the line. Scoring and codification is
difficult.
- Likert
Scaling - Respondents are asked to indicate the amount of
agreement or disagreement (from strongly agree to strongly disagree) on a
five-point scale. The same format is used for multiple questions.
- Semantic
differential scaling - Respondents are asked to rate on a 7 point
scale an item on various attributes. Each attribute requires a scale with
bipolar terminal labels.
- Stapel
scaling - This is a unipolar ten-point rating scale. It ranges
from +5 to -5 and has no neutral zero point.
- Thurstone
scaling - This is a scaling technique that incorporates the
intensity structure among indicators.
- Mathematically derived scaling - Researchers infer respondents’
evaluations mathematically. Two examples are multi
dimensional scaling and conjoint
analysis.
Scale evaluation
Scales should be tested for reliability,
generalizability, and validity.
Generalizability is the ability to make inferences from a sample to the
population, given the scale you have selected. Reliability is the extent to
which a scale will produce consistent results. Test-retest reliability checks
how similar the results are if the research is repeated under similar
circumstances. Alternative forms reliability checks how similar the results
are if the research is repeated using different forms of the scale. Internal
consistency reliability checks how well the individual measures included in
the scale are converted into a composite measure.
Scales and indexes have to be validated. Internal validation checks the
relation between the individual measures included in the scale, and the
composite scale itself. External validation checks the relation between the
composite scale and other indicators of the variable, indicators not included
in the scale. Content validation (also called face validity) checks how well
the scale measures what it is supposed to measure. Criterion validation checks
how meaningful the scale criteria are relative to other possible criteria.
Construct validation checks what underlying construct is being measured. There
are three variants of construct
validity. They are convergent
validity, discriminant
validity, and nomological
validity. The coefficient of reproducibility indicates how well the data
from the individual measures included in the scale can be reconstructed from
the composite scale.
See also
List of related topics