What is Cohen's Kappa?
Cohen's Kappa (\(\kappa\)) is a statistic that measures the agreement between two raters who each classify items into mutually exclusive categories. Unlike a simple percentage of matches, kappa corrects for the agreement that would be expected purely by chance, making it a more honest measure of reliability. This calculator handles the common case of two raters and two categories (a 2x2 table).
How to use this calculator
Enter the four cell counts of your 2x2 contingency table: how many items both raters called "Yes" (a), how many Rater 1 called Yes but Rater 2 called No (b), the reverse (c), and how many both called "No" (d). The calculator returns kappa along with the observed agreement and the chance-expected agreement.
The formula explained
Observed agreement is \(p_o = (a + d) / n\), the proportion of items the raters agreed on. Expected agreement \(p_e\) is built from the marginal totals: the chance both say Yes plus the chance both say No. Kappa is then $$\kappa = \frac{p_o - p_e}{1 - p_e}.$$ A value of 1 means perfect agreement, 0 means agreement equal to chance, and negative values mean worse than chance.
Worked example
Suppose \(a = 20\), \(b = 5\), \(c = 10\), \(d = 15\), so \(n = 50\). Observed agreement $$p_o = \frac{20 + 15}{50} = 0.70.$$ The marginals give $$p_e = \frac{25}{50}\cdot\frac{30}{50} + \frac{25}{50}\cdot\frac{20}{50} = 0.30 + 0.20 = 0.50.$$ Therefore $$\kappa = \frac{0.70 - 0.50}{1 - 0.50} = \frac{0.20}{0.50} = 0.40,$$ indicating fair agreement.
FAQ
How do I interpret the value? A common guide (Landis & Koch): <0 poor, 0–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, 0.81–1.00 almost perfect.
Why is my kappa low despite high agreement? When one category dominates, chance agreement (\(p_e\)) is high, so kappa can be low even with 90%+ raw agreement — the kappa paradox.
Can kappa be negative? Yes. Negative kappa means observed agreement is below what chance predicts, suggesting systematic disagreement.