Assessment is awesome. How great is it when students complete a task and feel success with applying something they have learned? Without feedback on our work, how would we know which areas we need to improve? Assessment is important as it validates and values teaching and learning. The thing about assessment is that you are not going to be able to assess everything in one task – there always needs to be a selection and ideally this selection represents what matters the most. However, we are in a great position where our current curriculum for statistics and the associated standards are closely aligned so the important things we want students doing when learning about statistics are the same things we want to see in the formal assessment. 
I think it is good to reflect on how we used to assess something like confidence intervals. This example is from the last external exam for the standard AS90642 Calculate confidence intervals for population parameters. The title of this standard sums up the focus for achieving this standard – calculating. Actually, students had access to graphics calculators so they didn’t even have to calculate by hand the confidence intervals (not that I am arguing for students to calculate confidence intervals by hand!). While teachers could place their teaching of confidence within real situations and work through statistical enquiry cycles with real data and meaningful purposes, it was harder to get teacher and student buy in when the formal assessment didn’t require these things. That is not the case anymore, where the formal assessment of confidence intervals at NCEA Level 3 requires students to work though an enquiry cycle using real data and requires the interpretation of a confidence interval (this was only required at Merit level in AS90642). However … 
… things can still go wrong if we only teach to the output required for an assessment and not to the understanding (or thinking) that this output represents. Turns out that I have been trying to give the same messages about learning about confidence intervals for a few years now – the snippet above is from a workshop I gave on confidence intervals in 2007 but I could be saying this about our current teaching of confidence intervals 🙂 
This kind of interpretation of a confidence interval has become the target for measuring understanding of confidence intervals. Each part that is bracketed could be linked to important understandings we want students to have (a good task would be to get students to explain the importance of each part) but…………. just because a student can write this or identify this interpretation does not exclude the possibility that they have misunderstandings about confidence intervals. How much you care about more than “the assessment” will influence whether you care about finding out about these misunderstandings. You won’t know unless you ask, and if you stick to only what is assessed in “the assessment” in terms of output you may never know 🙂 This is not a criticism of “the assessment” because as I said at the beginning of this post, you can’t assess everything in one task, and the current assessment is better than what we had. Communication and writing about statistics is important. But it is our responsibility as teachers to use formative assessment to check for understandings and misunderstandings throughout our teaching – this is one way we can show students we care about more than just “the assessment”.
The following slides are examples of formative assessment I have used for confidence intervals (at the plenary I showed these super fast because I completely ran out of times – whoops!) 
There are a lot of things students could discuss regarding the validity of the principal’s interpretation. Will they focus on why he used a sample in the first place rather than historic data from the student management system? What about using the next 50 days? Or treating all students absent for at least half a day as being absent for the whole day? Then there is the misunderstanding that the values around the middle of the confidence interval are more likely to be the true population parameter than those on the edge…..

Can students connect one of the visual components of the bootstrapping process (the medians for each resample for each group) to a key understanding of sampling variability, that the resample medians for the degree group are more spaced out because there was greater variation within the weekly incomes of that sample group (both sample sizes were the same)? 
Can students make sense of confidence intervals when no graphical outputs or prompts are provided? Do they understand what things affect the width of a confidence interval within a context? 
Can students design an investigation or statistics process where sampletopopulation inference is needed? How will they decide how to sample words from each book? How will they make a call about sticker colour? 
Do students get that the confidence interval is about plausible values for the population mean not individual weights of cereal packets? Are they just using pattern recognition or a procedure (e.g. just check whether 50 is in the interval of not!)? 
This post is based on a plenary I did for the Christchurch Mathematical Association (CMA) Statistics Day in November 2015 where I presented 10 ways to embrace the awesomeness that is our statistics curriculum. You can find all the posts related to this plenary in one place here as they are written.
Related