Here’s a scenario. You buy a jumbo bag of marshmallows that contains a mix of pink and white colours. Of the 120 in the bag, 51 are pink, which makes you unhappy because you prefer the taste of pink marshmallows.
Time to write a letter of complaint to the company manufacturing the marshmallows?
The thing we work so hard to get our statistics students to believe is that there’s this crazy little thing called chance, and it’s something we’d like them to consider for situations where random sampling (or something like that) is involved.
For example, let’s assume the manufacturing process overall puts equal proportions of pink and white marshmallows in each jumbo bag. This is not a perfect process, there will be variation, so we wouldn’t expect exactly half pink and half white for any one jumbo bag. But how much variation could we expect? We could get students to flip coins, with each flip representing a marshmallow, and heads representing white and tails representing pink. We then can collate the results for 120 marshmallows/flips – maybe the first time we get 55 pink – and discuss the need to do this process again to build up a collection of results. Often we move to a computer-based tool to get more results, faster. Then we compare what we observed – 51 pink – to what we have simulated.
In particular, you can show that models other than 50% (for the proportion of pink marshmallows) can also generate data (simulated proportions) consistent with the observed proportion. So, not being able to reject the model used for the test (50% pink) doesn’t mean the 50% model is the one true thing. There are others. Like I told my class – just because my husband and I are compatible (and I didn’t reject him), doesn’t mean I couldn’t find another husband similarly compatible.
Note: The app is in terms of percentages, because that aligns to our approach with NZ high school students when using and interpreting survey/poll results. However, I first use counts for any introductory activities before moving to percentages, as demonstrated with this marshmallow example. The app rounds percentages to the closest 1% to keep the focus on key concepts rather than focusing on (misleading) notions of precision. I didn’t design it to be a tool for conducting formal tests or constructing confidence intervals, more to support the reasoning that goes with those approaches.
This post provides the notes and resources for a workshop I ran for the Auckland Mathematical Association (AMA) on using data and simulation to teach probability modelling (specifically AS91585/AS91586). This post also includes notes about a workshop I ran for the AMA Statistics Teachers’ Day 2016 about my research into this area.
Using data in different ways
The workshop began by looking at three different questions from the AS91585 2015 paper. What was similar about all three questions was that they involved data, however, how this data was used with a probability model was different for each question.
For the first question (A), we have data on a particular shipment of cars: we know the proportion of cars with petrol cap on left-hand side of the car and the percentage of cars that are silver. We are then told that one of the cars is selected at random, which means that we do not need to go beyond this data to solve the problem. In this situation, the “truth” is the same as the “model”. Therefore, we are finding the probability.
For the second question (B), we have data on 10 cars getting petrol: we know the proportion of cars with petrol caps on the left-hand side of the car. However, we are asked to go beyond this data and generalise about all cars in NZ, in terms of their likelihood of having petrol caps on the left-hand side of the cars. This requires developing a model for the situation. In this situation, the “truth” is not necessarily the same as the “model”, and we need to take into account the nature of the data (amount and representativeness) and consider assumptions for the model (the conditions, the model applies IF…..). Therefore, when we use this model we are finding an estimate for the probability.
For the third question (C), we have data on 20 cars being sold: we know the proportion of cars that have 0 for the last digit of the odometer reading (six). What we don’t know is if observing six cars with odometer readings that end in 0 is unusual (and possibly indicative of something dodgy). This requires developing a model to test the observed data (proportion), basing this model on an assumption that the last digit of an odometer reading should just be explained by chance alone (equally likely for each digit). Therefore, when we use this model, we generate data from the model (through simulation) and use this simulated data to estimate the chance of observing 6 (or more) cars out of 20 with odometer readings that end in 0. If this “tail proportion” is small (less than 5%), we conclude that chance was not acting alone.
There’s a lot of ideas to get your head around! Sitting in there are ideas around what probability models are and what simulations are (see the slides for more about this) and as I discovered during my research last year with teachers and probability distribution modelling, these ideas may need a little more care when defining and using with students. The main reason I think we need to be careful using data when teaching probability modelling is because it matters whether you are using data from a real situation, where you do not know the true probability, or whether you are using data that you have generated from a model through simulation. Each type of data tells you something different and are used in different ways in the modelling process. In my research, this led to the development of the statistical modelling framework shown below:
All models are wrong but some are more wrong than others: Informally testing the fit of a probability distribution model
At the end of 2016, I presented a workshop at the AMA Statistics Teachers’ Day based on my research into probability distribution modelling (AS91586). This 2016 workshop also went into more detail about the framework for statistical modelling I’m developing. The video for this workshop is available here on Census At School NZ.
We have a clear learning progression for how “to make a call” when making comparisons, but how do we make a call about whether a probability distribution model is a good model? As we place a greater emphasis on the use of real data in our statistical investigations, we need to build on sampling variation ideas and use these within our teaching of probability in ways that allow for key concepts to be linked but not confused. Last year I undertook research into teachers’ knowledge of probability distribution modelling. At this workshop, I shared what I learned from this research, and also shared a new free online tool and activities I developed that allows students to informally test the fit of probability distribution models.
During the workshop, I showed a live traffic camera from Wellington (http://wixcam.citylink.co.nz/nph-webcam.cgi/terrace-north), which was the context for a question developed and used (the starter question AKA counting cars). Before the workshop, I recorded five minutes of the traffic and then set up a special html file that pauses the video every five seconds. This was so teachers at the workshop (and students) could count the number of cars passing different points on the motorway (marked with different coloured lines) every five seconds. To use this html file, you need to download both of these files into the same folder – traffic.html and traffic.mp4. I’ve only tested my files using the Chrome browser 🙂
If you don’t want to count the cars yourself, you can head straight to the modelling tool I developed as part of my research:http://learning.statistics-is-awesome.org/modelling-tool/. In the dropdown box under “The situation” there are options for the different coloured points/lines on the motorway. The idea behind getting teachers and students to actually count the cars was to try to develop a greater awareness of the complexity of the situation being modelled, to reinforce the idea that “all models are wrong” – that they are approximations of reality but not the truth. Also, I wanted to encourage some deeper thinking about limitations of models. For example, in this situation, looking at five second periods, there is an upper limit on how many cars you can count due to speed restrictions and following distances. We also need to get students to think more about model in terms of sample space (the set of possible outcomes) and the shape of the distribution (which is linked to the probabilities of each of these outcomes), not just the conditions for applying the probability distribution 🙂
In terms of the modelling tool, I developed a set of teaching notes early last year, which you can access in the Google drive below. This includes some videos I made demonstrating the tool in action 🙂 I also started developing a virtual world (stickland http://learning.statistics-is-awesome.org/stickland-modelling/) but this is still a work in progress. Once you have collected data on either the birds or the stick people, you can copy and paste it into the modelling tool. There will be more variables to collect data on in the future for a wider range of possible probability distributions (including situations where none is applicable).
At the start of the year I started using a service call unroll me with my gmail account. It allows you to wrap up regular or subscription emails into one daily email digest. It takes a number of months to setup the service to capture all your regular or subscription emails, but I have found it helpful in reducing the clutter in my email so worth the minimal effort.
I noticed – as you do when you’re a stats teacher – that the number of emails that are rolled up per day varies. I wondered if there was anything going on – any patterns, trends etc. – so went back over the last couple of months and recorded how many emails were wrapped up per day.
So here’s a little challenge for your students 🙂
Using the data on the number of my emails wrapped per day for the last few months, can they predict how many of my emails will be wrapped up over the next four days (Tuesday), Wednesday, Thursday and Friday?
Sorry there have been no posts for a while. I have a whole stash of draft posts nearly ready to be published, but work, study, wedding planning and life in general have got in the way 🙂
One of the few posts I have made this year was about statistical modelling so I thought I’d quickly share something related to this – an article about how an Australian couple used statistical modelling to predict how many guests will turn up to their wedding.
This post provides the notes for a workshop I ran recently for the Auckland Mathematical Association (AMA) on developing statistics lessons that will engage students and promote statistical thinking. The workshop involved looking at some examples of the peer-reviewed lesson plans available from STEW (STatistics Education Web https://www.amstat.org/education/stew/) and discussing how to adapt these to the New Zealand Curriculum. We also reviewed a well-designed statistical model eliciting activity in order to identify features of high quality statistics lessons. We didn’t quite get to apply these features to co-construct a new statistical modelling activity based around the question “Can you determine someone’s gender based on their writing?” but I will include some ideas for this in part two of this post.
Show me the resources!
Designing effective statistics learning activities (including online tasks) requires the use of contexts and situations that will engage students. Great ideas for activities are all around us, and can vary in inception from interesting news articles we read to everyday events or conversations that surprise us. However, it is not just about having a great hook for learning task, we also need to pay attention to the statistical thinking we are promoting through the learning activity. How do you turn these awesome ideas into effective and meaningful lessons?
It can be difficult if you are new to teaching statistics to be able to separate the good from the bad in terms of assessing how well other people’s resources allow for the building and integrating of both statistical and contextual knowledge. Often we are motivated by time pressures and ease of use when we select teaching resources. Ideally teaching resources that are shared would include notes to help other teachers about the kinds of questions to ask students, possible misconceptions, the kinds of things we want to see students doing and how to help students with misunderstandings. These notes of course take time to write up……. hence why I have not shared very much yet on this site!
Fortunately, there are places where you can find really good teaching resources which you can use to inform and guide your teaching. You can then use these well-designed lessons as the basis for your own activity, modifying and adjusting where necessary to meet the needs of your students. It’s easier to do this when the resources explain the thinking for teachers and for students, so you can see which aspects of the activity are “non-negotiable” and which aspects can be changed. However, you should be prepared to do some thinking yourself – I haven’t yet found a teaching resource that I can use in the classroom exactly how it is when I’ve found it. I see this as a positive thing though (having to think!), because through critically reviewing resources I have developed a greater confidence for teaching statistics.
In particular, I have learned a lot from the lessons plans developed and shared by others through Census at School (NZ) and through workshops or conferences. For example, these resources developed by Joanne Woodward were written a number of years ago, and while there may be some areas of the curriculum where the focus has changed, the teacher instructions that accompany the student worksheets still contain excellent advice (particularly in terms of structure and teaching strategies). These resources are so good, you’ll find some lessons on STEW that are based on her lesson plans (e.g. Bear hugs).
Let’s STEW on this ….
Another great place to find teaching resources is STEW (Statistics Education Web). This is site set-up and maintained by the American Statistical Association (ASA) and each lesson plan is peer-reviewed by an editorial board. You’ll find similar planning templates are used for each of the lessons and a huge variety of contexts and ideas covered across the plans. These lesson plans are based on the US curriculum/standards so there are some differences in content and approaches, and the levels used may not match exactly to NZ curriculum levels or NCEA achievement standards. For example, this lesson plan about text messaging covers great ideas about bivariate data and fitting linear regression models, but unlike what we do in NZ, extends the statistical knowledge covered into sample-to-population inference for the regression coefficients and the correlation coefficient, which is not required for AS91581. This lesson about M&Ms is pretty much good to go and focuses on developing understanding of sampling variability for proportions. The lesson could be extended to lead towards ideas of margin of error for AS91584 and also the use of bootstrapping to construct confidence intervals.
Below is my quick summary at the differences you’ll find between the NZ curriculum and the US curriculum/standards. The key areas I’ve focused on are inference and modelling ideas and how these are developed across the NZ curriculum, contrasted with aspects of the US curriculum that overlap with these ideas. There are other things that we cover in NZ that the US doesn’t, and vice versa, but I’m not going to address these differences now 🙂
Not required for NZ curriculum
Sample to population inference
Based on Informal Inferential Reasoning (IIR), culminating in the use of bootstrapping to construct confidence intervals (formal) and the rule of thumb for proportions 1/root(n)
S-to-P inference for linear regression models (i.e. we use point estimates only for slope, intercept, correlation coefficient)
You might be familiar with this idea (the shuffle function on an iPod or other music player), but not perhaps with this exact activity. It is an example of a MEA (model eliciting activity) and was designed by Joan Garfield and Laura Ziegler as part of a CATALST project (http://www.tc.umn.edu/~catalst/). I chose this lesson plan to review for the workshop because I am not an expert in MEAs but I really liked how clear the modelling approach was presented for this particular activity. In particular, the key design features of this lesson that could be used as the basis for designing a new statistical modelling activity are shown in the table below:
The activity begins with a claim that songs are not randomly generated using this Shuffle function.
Students are given a set of 25 randomly generated playlists for students to use as a basis to describe characteristics of a random sample, in this case, a randomly generated playlist.
After students come up with their ideas of what characteristics to look for, they are given a set of five additional playlists (also randomly generated) on which to test their rules.
Once they feel confident that their rules can be used to determine if a set of songs have NOT been randomly generated, they are then given three disputed playlists, which students are asked to judge based on their rules.
Students work in groups to examine the data, come up with rules, and finally, write a report about their finding and whether or not they believe the three disputed playlists were not randomly generated.
I have used other parts of what has been provided in this particular lesson plan to present the essentials of the lesson on a one page layout: statistical_modelling_design. My goal with producing a one page summary (or planning template) was to try to draw out the key design features to help teachers plan their own statistical modelling activities. Since we were focusing on how to adapt STEW lessons to the NZ curriculum, I added a section at the bottom of this summary about how to adjust to the NZ curriculum (just a few ideas). We’ll look at using the planning template in part two of this post – online dating profiles – and see if we can design a similar statistical modelling activity. If you haven’t yet attempted this quiz (20 online dating profiles, where you have to decide if the writer is male or female) then hop on over and try it now 🙂