Example of under coverage introducing bias | Study design | AP Statistics | Khan Academy


– [Instructor] A senator wanted to know about how people in her state felt about internet privacy issues. She conducted a poll by calling 100 people whose names were randomly
sampled from the phone book. Note that mobile phones
and unlisted numbers are not in phone books. The senator’s office called those numbers until they got a response
from all 100 people chosen. The poll showed that 42% of respondents were very concerned
about internet privacy. What is the most concerning source of bias in this scenario? And we should also think about, well what kind of bias
does that likely introduce? Is this likely to be an
overestimate or an underestimate of the number of respondents? And maybe there’s no bias here. But our choices, and no bias is not one of
the choices you can imagine, it’s gonna be one of these three. So, I encourage you to pause this video and think about what we just said. Where we’re a senator, we’re trying to figure out what percentage of our response, of our constituents, are very concerned about internet privacy. And we go to the phone
book, we sample 100 people. We keep calling them until they answer, and we get that 42% are very concerned. So what’s the source of bias? Alright, now let’s work
through this together. So nonresponse is, would’ve been the case, if we selected these 100 people, and let’s say only 50
people answered the phone and we didn’t keep calling them. Then we’d say, well 50 of
the people who we sampled to answer our survey didn’t even respond. There was a nonresponse there, what was there about those 50 people? Maybe there was something that
would’ve skewed the survey or actually if we, had we gotten them, it would’ve gotten maybe get better data. But in this case, they tell us, the senator’s office called those numbers until they got a response
from all 100 people chosen. So the 100 people that they chose, they made sure they got a response. So nonresponse is not
going to be an issue here. Alright, next choice, undercoverage. Well undercoverage is
where you’re not able to sample from part of the population. And a part of the population
that actually might, because you didn’t sample it, it might introduce bias. Now let’s think about what
happened in this situation. We are a senator. We want to sample all of our constituents, but we choose, we instead we sample from the constituents who happen to be listed in the phone book. So these are the people who happen to be listed in the phone book. And so we’re not sampling from people who are not in the phone book, who maybe have landlines
and they’re unlisted. And we’re not sampling from people who don’t have landlines,
who only have mobile phones. And you might say, well
why is that important? Well think about it, people who decide not to
list in the phone book or people who don’t even have a landline, some of those people might be
a little bit more concerned about privacy than everyone else. They explicitly chose not to be listed. So undercoverage is
definitely a very concerning source of bias over here. We are sampling from only a subset of our entire population we care about. In particular, we’re missing out on people who might care about privacy. And so I would say
because of undercoverage, 42% is likely to be an underestimate of the people concerned
about internet privacy. Probably a higher proportion
of the people out here care about privacy,
because they’re unlisted or they don’t even have a landline. So, undercoverage, it
probably introduced bias, and it implies that
42% is an underestimate of the percentage of the
senator’s constituents who care about internet privacy. Now the last question,
volunteer response sampling. Well this would be the case where you, you know the senator, I don’t know, put a billboard out or just told someone, told a bunch of people,
maybe on her website, hey vote for this or
give us your information on how much you care
about internet privacy. And that would’ve been,
the source of bias there, is well who shows up on that website? Once again, if you did, hey come to my website and fill it out, you’re only getting
information from a subset of your population who are choosing, who are volunteering. That is not the situation
that she did over here. She didn’t ask 100 people to volunteer. Her team went out and got
them from the phone book. So this was definitely
a case of undercoverage.

1 comment

Leave a Reply