What Is the F1 Score?
The F1 score is a single metric that combines precision and recall into one number, making it a popular way to evaluate classification models in machine learning, information retrieval, and statistics. It is the harmonic mean of precision and recall, so it rewards classifiers that balance the two rather than excelling at only one.
How to Use This Calculator
Enter three counts from your confusion matrix: True Positives (TP, correctly predicted positives), False Positives (FP, negatives wrongly predicted positive), and False Negatives (FN, positives that were missed). The calculator instantly returns precision, recall, and the resulting F1 score.
The Formula Explained
Precision = \( \frac{\text{TP}}{\text{TP} + \text{FP}} \) measures how many predicted positives were correct. Recall = \( \frac{\text{TP}}{\text{TP} + \text{FN}} \) measures how many actual positives were found. The F1 score is then
$$ F_1 = \frac{2 \cdot (\text{precision} \cdot \text{recall})}{\text{precision} + \text{recall}} $$Because it is a harmonic mean, a low value in either precision or recall pulls the F1 score down sharply.
Worked Example
Suppose \( \text{TP} = 70 \), \( \text{FP} = 30 \), \( \text{FN} = 10 \). Precision = \( \frac{70}{100} = 0.70 \). Recall = \( \frac{70}{80} = 0.875 \).
$$ F_1 = \frac{2 \cdot (0.70 \cdot 0.875)}{0.70 + 0.875} = \frac{2 \cdot 0.6125}{1.575} \approx 0.7778 $$or about 77.78%.
FAQ
When should I use F1 instead of accuracy? F1 is preferred when classes are imbalanced, since accuracy can be misleadingly high when one class dominates.
What is a good F1 score? F1 ranges from 0 to 1; closer to 1 is better. What counts as "good" depends on the task, but values above 0.8 are often considered strong.
Why is it a harmonic mean? The harmonic mean penalizes extreme imbalance between precision and recall more than a simple average would, ensuring both must be reasonably high.