Please refer to the built-in documentation and the package vignette for further details; this page does not document all features.
Van der Eijk gives the example of respondents placing political parties on a 7-point rating scale (ordinal).
Position | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
---|---|---|---|---|---|---|---|
PvdA | 2.4% | 2.8% | 3.2% | 6.2% | 13.5% | 30.4% | 41.6% |
D66 | 1.6% | 2.6% | 8.2% | 21% | 29.3% | 27% | 10.3% |
To calculate the level of agreement for these two frequency distribution, we simply type agreement(c(2.4, 2.8, 3.2, 6.2, 13.5, 30.4, 41.6)) for the PvdA, and agreement(c(1.6, 2.6, 8.2, 21, 29.3, 27, 10.3)) for the D66. This gives us levels of agreement of 0.61 for the PvdA, and 0.48 for the D66.
We can also calculate the level of agreement for the frequency distribution used in the above section: agreement(c(30, 40, 210, 130, 530, 50, 10)), which gives a value of 0.61.
If we have not already calculated the frequency distributions (i.e. how many responses in which category), we could use the table command in R to get the frequencies. To avoid additional steps, the function collapse is provided. If we have the responses [1, 2, 4, 2, 5, 2, 3, 1, 2, 1, 3, 2, 4, 1, 5, 2, 3, 2, 4, 2, 3, 1, 1, 3], the function gives you the frequencies. Use collapse(c(1, 2, 4, 2, 5, 2, 3, 1, 2, 1, 3, 2, 4, 1, 5, 2, 3, 2, 4, 2, 3, 1, 1, 3)) to get the frequencies [6 8 5 3 2], indicating that there are 6 responses in category 1, 8 in category 2, etc. We can then use this with the agreement function: agreement(collapse(c(1, 2, 4, 2, 5, 2, 3, 1, 2, 1, 3, 2, 4, 1, 5, 2, 3, 2, 4, 2, 3, 1, 1, 3))) to give a value of 0.31. Typically, we use the variable name here (e.g. POSITION): collapse(POSITION).
An important advantage of using the collapse function over the built-in table function is that it can deal with categories with 0 responses. In this case, you need to specify the position at which categories exist, using the pos= argument. If we have the responses [1, 2, 4, 2, 5, 2, 7, 7, 3, 1, 2, 1, 3, 2, 4, 1, 5, 2, 3, 2, 4, 2, 3, 1, 1, 3], both the table and the collapse function give us the following frequencies [6 8 5 3 2 2]. There are no responses with position 6. We can tell the collapse function that there are 7 response categories by specifying the positions at which categories exist: collapse(c(1, 2, 4, 2, 5, 2, 7, 7, 3, 1, 2, 1, 3, 2, 4, 1, 5, 2, 3, 2, 4, 2, 3, 1, 1, 3), pos=1:7). I can also use c(1, 2, 3, 4, 5, 6, 7) instead of the 1:7. This time I get [6 8 5 3 2 0 2] including the 0 for position 6. To get the level of agreement, this is agreement(collapse(c(1, 2, 4, 2, 5, 2, 7, 7, 3, 1, 2, 1, 3, 2, 4, 1, 5, 2, 3, 2, 4, 2, 3, 1, 1, 3), pos=1:7)), 0.39.
Typically, we use the variable name (e.g. POSITION): agreement(collapse(POSITION)).
There is a function agreementError() to simulate coding errors, also useful to bootstrap when there is insufficient variance.
The polarization function simply rescales agreement values to provide a more intuitive interpretation if one is interested in polarization rather than agreement. More precisely, it gives you (1-agreement)/2. This means that a polarization value of 1 means perfect polarization (bottom-left corner of the graph above), and a value of 0 means perfect agreement. A value of 0.5 corresponds to the "no agreement" in the above graph.
Usage is equivalent to agreement: polarization(collapse(c(1, 2, 4, 2, 5, 2, 7, 7, 3, 1, 2, 1, 3, 2, 4, 1, 5, 2, 3, 2, 4, 2, 3, 1, 1, 3), pos=1:7)) gives you 0.30. Or we can calculate polarization for the Dutch parties: polarization(c(2.4, 2.8, 3.2, 6.2, 13.5, 30.4, 41.6)) for the PvdA, and polarization(c(1.6, 2.6, 8.2, 21, 29.3, 27, 10.3)) for the D66, giving 0.20 and 0.26 respectively.
Typically, we use the variable name (e.g. POSITION): polarization(collapse(POSITION)).
The package includes other measures of agreement or consensus: Berry and Mielke's IOV, Blair and Lacy's l, Tastle and Wierman's measure of consensus, Blair and Lacy's d-squared, Shannon entropy (following Tastle and Wierman), Kvalseth's COV, ordinal dispersion as introduced by Leik, Blair and Lacy's l-squared, and the MRQ polarization index. Please refer to the package documentation.
Galtung introduced a system to classify distributions according to shape. This is a means to reduce complexity.
For further details, refer to Galtung, J. 1969. Theory and Methods of Social Research. Oslo: Universitetsforlaget.
ajus(distribution) gives you the shape or type, as well as whether there is a skew. I have added two new types ("F" and "L") to complement the ones identified by Galtung. You can choose whether to use a strict AJUS system following Galtung, or use the modified AJUSFL system that includes the L and F types. The default is the modified variant. The skew is given as -1 for a negative skew, 0 for absence of skew, or +1 for a positive skew.
A: unimodal distribution, peak in the middle
J: unimodal, peak at right end
L: unimodal, peak at left end
U: bimodal, peak at both ends
S: bimodal or multi-modal, multiple peaks
F: flat, no peak; this type is new
Galtung developed the AJUS system for a somewhat systematic classification of distributions, but not for the use on computers. The advantage of using a function on the computer is twofold. On the one hand, we can easily apply the AJUS system to many distributions, sapply may be your friend there. On the other hand, the tolerance used in th AJUS system is applied in a systematic manner. When using human judgement on whether two values are roughly the same or different, a really systematic approach cannot guaranteed. In the AJUS function, you can specify the argument tolerance to change the tolerance. The AJUS function ignores all differences euqal to or smaller than the tolerance parameter. The package default is 0.1, possibly useful when working with values between 0 and 1, in which case it corresponds to 10 per cent. The tolerance parameter is not a trivial choice, and it can affect results.
See the package help or vignette for the helper functions ajusPlot and ajusCheck. These allow a graphical inspection of distribution along with their classification and sensitivity checks with regard to the tolerance parameter.
The ISD system by Galtung is another way to reduce complexity, this time for changes over time. The ISD takes a vector with three time points. These three points describe two periods during which changes may occur.
isd(distribution) gives you a type and a description of the type.
Type 1: increase in both periods
Type 2: increase in first period, flat in second period
Type 3: increase in first period, decrease in second period
Type 4: flat in first period, increase in second period
Type 5: flat in both periods
Type 6: flat in first period, decrease in second period
Type 7: decrease in first period, increase in second period
Type 8: decrease in first period, flat in second period
Type 9: decrease in both periods
The function modes tells you at which position the mode is. This can be used, for instance, in conjunction with the agreement function to identify at which point agreement is reached (not only that). The functions accept frequency distributions where multiple positions are the most common ones, which can happen in ordered rating scales. The function secondModes additionally gives you the value and position(s) of the second most common value. In addition to the mode and the positions, the functions also indicate whether these values are contiguous (i.e. in neighbouring response categories).
Last update of this page 6 January 2023