Wednesday, July 29, 2020

Statistics and COVID

Statistics is actually a branch of mathematics. In order to enter any sort of a graduate level medical course, one of the prerequisites is that the applicant have taken Statistical analysis as a math course during undergraduate studies.

There are a number of things that you can use this for. One of the things statistics is used for is epidemiology. The larger of a sample taken from a population of individuals, the more that sample resembles the makeup of that population. This is a fairly large sample size. Now one of the rules is that the samples must be a random sampling of the population. The smaller the sample, the more important it is that the sample be free from bias and be truly random. There are always biases built into a sample, which is why asking 1,000 people out of the 330 million in the USA about their political opinion is so often wrong. 

The reason for this error is that even one person lying in a sample that small can result in a large error. The more of the population that is a part of your sample, the more accurate that statistic will be. For example, a survey of all people in a population would be the most accurate. A survey of one person would be the least accurate. 

With that being said, there is a way to calculate just how accurate your statistic is. Let's try that with Florida's COVID testing. Florida to date has tested 3.5 million people for COVID, or about 18% of the population. This is a fairly large sample size, with nearly one out of every five people being tested. Just how accurate is our result? Let's use the Margin of Error calculator to see:

Of Florida's 20 million people, we have tested 3.5 million of them, and 450,000 have tested positive. This means that 12% of tests were positive. 

Using my margin of error calculator, I can say with 95% confidence that 11.1 to 13.9% of people in Florida have COVID. That means that there is a 95% probability that there are currently somewhere between 2.2 million and 2.8 million people in Florida who have been infected with COVID. (Assuming that the errors in testing are randomly distributed between false positive and false negative. That is an entirely different problem.)
Since people don't generally die without notice, our sample size for deaths is 100%. (This assumes that all COVID death reports are accurate. Again, another problem) Since there have been 6,300 fatalities, this gives us a IFR of between  0.2% and 0.3%. 

In short, a person who is infected by COVID has a 99.7% chance of survival. 

That is a good thing, because with a 12% infection rate, there is absolutely no way to stop this virus from eventually infecting us all. Put that infection rate in perspective: One out of every eight people in Florida has already caught COVID in less than six months. In another six months, it will likely be somewhere close to half the population.

Use the same calculator on your own state's numbers, and see if you get the same numbers.
_____________________________________________________________________

I did the same calculation with New York's numbers. New York has tested 5.6 million people with about 413,000 positives, for a positivity rate of 7.4% out of a total population of 19.5 million.

 Using these numbers, there is a 95% probability that there are currently somewhere between  1.43 million and 1.47 million people in New York who have been infected with COVID.






3 comments:

FredLewers said...

Been saying since March that the second and third order effects are gonna be more damaging than the actual virus. I wonder how many people have/will die because they avoided medical care out of fear? How many people injured/dead because of criminals being released early because of COVID 19? How many people died because elected officials are running medical care? How many political points are scored for each COVID 19 death? Do democrats and Republicans get the same point value? Asking so I can be a better informed voter...
Amerika is FUBAR.

Punzdeleon said...

The testing is not at random.

Divemedic said...

@Punzdeleon: At first, it wasn't. They were only testing the very sick, which is why the CFR was so high. However, they are now testing anyone who asks for a test, plus the many people who are being required to because of other health issues, or because their employer is requiring it.

At a certain point, it no longer matters of testing is random, because the sample size becomes large enough that selection bias becomes less and less of a factor, as you begin a regression towards the mean. When do you reach that point? 80% of the population? 60%?

Is 18% enough to get a fair picture of what is happening? I think it is, but only time will tell.