Hello OP, I have a couple of B.S's (in computer science and physics) and work a lot in probability and statistics.
Statistics is a generalization of propositional logic (interestingly, quantum mechanics is a generalization of probability theory!), however, in those generalizations, there's a lot more room for 'hand waving'. The more flexibility tends to make it easier to misconstrue. Here's an example or you can watch a video at the bottom of this post for an example pertaining to racism, this one is for smoking just to show how statistics can show the exact opposite of what we might expect:
In 1972 a one-in-six random survey of the electoral roll — largely
concerned with studying heart disease and smoking — was carried out in Whickham, a mixed urban
and rural district near Newcastle upon Tyne in England. Twenty years later a follow-up study was
conducted, with the results published in the journal Clinical Endocrinology in 1995.
The dataset summarized below in this problem pertains to the subsample of 1,314 women in the
study who were classified in the original survey either as current smokers or as never having smoked.
There were relatively few women (162) who had smoked but stopped, and only 18 whose smoking
habits were not recorded; these women are not included in the data here. The 20-year survival
status was determined for all the women in the original survey.
P (dead) = 369/1314 ≈ 28.08%
– P (dead|smoker) =139/582 ≈ 23.88%
– P (dead|¬smoker) = 230/732 ≈ 32.42%– This establishes an assocation between smoking and mortality, in particular (looking
at only these probabilities),
being a smoker increases you’re odds of living!! (bull shit alert) The
direction of this result is suprising, but only because it doesn’t take into consideration
that people who are smoking are younger in general. This does not prove any causation
between smoking and mortality because it
fails to consider a confounding factor.
Now, let's incorporate a confounding factor into our model, repeat the same statistics but with age as a fixed boundary classifier:
– For women ages 18-64...
∗ P (dead) = 162/1072 ≈ 15.11%
∗ P (dead|smoker) = 93/533 ≈ 17.45%
∗ P (dead|¬smoker) = 69/539 ≈ 12.80%
– For women ages 65+...
∗ P (dead) = 207/242 ≈ 85.54%
∗ P (dead|smoker) = 46/49 ≈ 93.88%
∗ P (dead|¬smoker) = 161/ 193 ≈ 83.42%
– We can see here that when taking age into consideration, regardless of your age group, the
likelihood of dying given you were a smoker increases relative to the average. This shows
an association between smoking and dying, in particular, it shows a higher probability
in mortality given a woman is a smoker.
Here's a
video that might help show how statistics can show the exact opposite results in the context of racism:
https://www.youtube.com/watch?v=_dPIT7r-LUw