Visualising bootstrap confidence intervals and randomisation tests with VIT Online

Simulation-based inference is taught as part of the New Zealand curriculum for Statistics at school level, specifically the randomisation test and bootstrap confidence intervals. Some of the reasons for promoting and using simulation-based inference for testing and for constructing confidence intervals are that:

  • students are working with data (rather than abstracting to theoretical sampling distributions)
  • students can see the re-randomisation/re-sampling process as it happens
  • the “numbers” that are used (e.g. tail proportion or limits for confidence interval) are linked to this process.

If we work with the output only, for example the final histogram/dot plot of re-sampled/bootstrap differences, in my opinion, we might as well just use a graphics calculator to get the values for the confidence interval 🙂

In our intro stats course, we use the suite of VIT (Visual Inference Tools) designed and developed by Chris Wild to construct bootstrap confidence intervals and perform randomisation tests. Below is an example of the randomisation test “in action” using VIT:

Last year, VIT was made available as a web-based app thanks to ongoing work by Ben Halsted! So, in this short post I’ll show how to use VIT Online with Google sheets – my two favourite tools for teaching simulation-based inference 🙂

1. Create a rectangular data set using a Google sheet. If you’re stuck for data, you can make a copy of this Google sheet which contains giraffe height estimates (see this Facebook post for context – read the comments!)

2. Under File –> Publish to web, choose the following settings (this will temporarily make your Google sheet “public” – just “unpublish” once you have the data in VIT Online)

Be careful to select “Sheet1” or whatever the sheet you have your data in, not “Entire document”. Then, select “Comma-separated values (.csv)” for the type of file. Directly below is the link to your published data which you need to copy for step 3.

3. Head to VIT online –>  https://www.stat.auckland.ac.nz/~wild/VITonline/index.html. Choose “Randomisation test” and copy the link from step 2 into the first text box. Then press the “Data from URL” button.

4. At this point, your data is in VIT online, so you can go back and unpublish your Google sheet by going back to File –> Publish to web, and pressing the button that says “Stop publishing”.

The same steps work to get data from a Google spreadsheet into VIT online for the other modules (bootstrapping etc.).

[Actually, the steps are pretty similar for getting data from a Google spreadsheet into iNZight lite. Copy the published sheet link from step 2 in the appropriately named “paste/enter URL” text box under the File –> Import dataset menu option.]

In terms of how to use VIT online to conduct the randomisation test, I’ll leave you with some videos by Chris Wild to take a look at (scroll down). Before I do, just a couple of differences between the VIT Chris uses and VIT Online and a couple of hints for using VIT Online with students.

You will need to hold down ctrl to select more than one variable before pressing the “Analyse” button e.g. to select both the “Prompt” and “Height estimate in metres” variables in the giraffe data set.

Also, to define the statistic to be tested, in VIT Online you need to press the button that says “Precalculate Display” rather than “Record my choices” as shown in the videos.

Lastly, a really cool thing about VIT Online is that once you have copied over the URL for your published Google sheet, as long as you keep your Google sheet published, you can grab the URL from VIT Online to share with students e.g. https://www.stat.auckland.ac.nz/~wild/VITonline/randomisationTest/RVar.html?file=https://docs.google.com/spreadsheets/d/e/2PACX-1vTcaGSrAbGSntbrUoifNv8g048KJwEnBI–Rmmxqu1N0rb0VRUHoUkIeT-8xo3O9eqTUqZIML_EH523/pub?gid=0&single=true&output=csv&var=%20Prompt,c&var=Height%20estimate%20in%20metres,n. Sure, it’s not the nicest looking URL in the world, so use a URL shortener like bit.ly, goo.gl, tiny.cc etc. if sharing with students to type into their devices.

Note: VIT Online is not optimised to work on small screen devices, due to the nature of the visualisations. For example, it’s important that students can see all three panels at the same time during the process, and can see what is happening!

Now, here are those videos I promised 🙂

Game of data

This post is second in a series of posts where I’m going to share some strategies for getting real data to use for statistical investigations that require sample to population inference. As I write them, you will be able to find them all on this page.

What’s your favourite board game?

I read an article posted on fivethirtyeight about the worst board games ever invented and it got me thinking about the board games I like to play. The Game of life has a low average rating on the online database of games referred to in this article but I remember kind of enjoying playing it as a kid. boardgamegeek.com features user-submitted information about hundreds of thousands of games (not just board games) and is constantly being updated. While there are some data sets out there that already feature data from this website (e.g. from kaggle datasets), I am purposely demonstrating a non-programming approach to getting this data that maximises the participation of teachers and students in the data collection process.

To end up with data that can be used as part of a sample to population inference task:

  1. You need a clearly defined and nameable population (in this case, all board games listed on boardgamegeek.com)
  2. You need a sampling frame that is a very close match to your population.
  3. You need to select from your sampling frame using a random sampling method to obtain the members of your sample.
  4. You need to define and measure variables from each member of the sample/population so the resulting data is multivariate.

boardgamegeek.com actually provide a link that you can use to select one of the games on their site at random (https://boardgamegeek.com/boardgame/random), so using this “random” link (hopefully) takes care of (2) and (3). For (4), there are so many potential variables that could be defined and measured. To decide on what variables to measure, I spent some time exploring the content of the webpages for a few different games to get a feel for what might make for good variables. I decided to stick to variables that are measured directly for each game, rather than ones that were based on user polls, and went with these variables:

  • Millennium the game was released (1000, 2000, all others)
  • Number of words in game title
  • Minimum number of players
  • Maximum number of players
  • Playing time in minutes (if a range was provided, the average of the limits was used)
  • Minimum age in years
  • Game type (strategy or war, family or children’s, other)
  • Game available in multiple languages (yes or no)

Time to play!

I’ve set up a Google form with instructions of how you can help create a random sample of games from boardgamegeek.com at this link: https://goo.gl/forms/8yBqryGTzrZGhEVx2. As people play along, the sample data will be added here: https://docs.google.com/spreadsheets/d/e/2PACX-1vSzR_VSVzaaeWpCvYbAQCUewaM3Tad2zfTBO7AWuDgFFTj5Jaq2TBo6N-gQGCe5e5t_qKW7Knuq6-pr/pub?gid=552938859&single=true&output=csv . The URL to the game is included so that the data can be checked. Feel free to copy and adapt however you want, but do keep in mind that nature of the variables you use. In particular, be very careful about using any of the aggregate ratings measures (and another great article by fivethirtyeight about movie ratings explains some of the reasons why.)

Bonus round

I wrote a post recently – Just Google it – which featured real data distributions. boardgamegeek.com also provides simple graphs of the ratings for each game, so we can play a similar matching game. You could also try estimating the mean and standard deviation of the ratings from the graph, with the added game feature of reverse ordering!

Which games do you think match which ratings graphs?

  1. Monopoly
  2. The Lord of the Rings: The Card Game
  3. Risk
  4. Tic-tac-toe
A
B
C
D

I couldn’t find a game that had a clear bi-modal distribution for its ratings but I reckon there must be games out there that people either love or hate 🙂 Let me know if you find one! To get students familiar with boardgamegeek.com, you could ask them to first search for their favourite game and then explore what information and ratings have been provided for this on the site. Let the games begin 🙂

Just Google it

Here’s a really quick idea for a matching activity, totally building off Pip Arnold’s excellent work on shape.

At the bottom of this post are six “Popular times” graphs generated today by Google when searching for the following places of interest:

  1. Cafe
  2. Shopping mall
  3. Library
  4. Swimming pool
  5. Gym
  6. Supermarket

Can you match which graphs go with which places? 🙂

[you can find the answers at the bottom]

A

 

B

 

C

 

D

 

E

 

F
Click here to reveal the answers

Finding real data for real data stories

This post is first in a series of posts where I’m going to share some strategies for getting real data for real data stories, specifically to use for statistical investigations that require sample to population inference. As I write them, you will be able to find them all on this page.

Key considerations for finding real data for sample to population inference tasks

It’s really important that I stress that the approaches I’ll discuss are not necessarily what I would typically use when finding data to explore. Generally, I’d let the data drive the analysis not the analysis drive the data I try to find. These are specific examples so that the data that is obtained can be used sensibly to perform sample to population inference. It’s also really important to talk about why I’m stressing the above 🙂 In NZ we have specific standards that are designed to assess understanding of sample to population inference, using informal and formal methods that have developed by exploring the behaviour of random samples from populations (AS91035, AS91264, AS91582). So, for the students’ learning about rules of thumbs and confidence intervals to make sense, we need to provide students with clearly defined named populations with data that are (or are able to be) randomly sampled from these populations. At high school level at least, these strict conditions are in place so that students can focus on one central question: What can and can’t I say about the population(s) based on the random sample data?

For all the examples I’ll cover in this series of posts, there are four key considerations/requirements:

  1. You need a clearly defined and nameable population. Ideally this should be as simple and clear as possible to help students out but to ensure (2) the “name” can end up being quite specific.
  2. You need a sampling frame that is a very close match to your population. This means you need a way to access every member of your population to measure stuff about them (variables). Sure, this is not the reality of what happens in the real world in terms of sampling, but remember what I said earlier about what was important 🙂
  3. You need to select from your sampling frame using a random sampling method to obtain the members of your sample. It is sufficient (and recommended) to stick to simple random sampling. In some cases, you may be able to make an assumption that what you have can be considered a random sample, but I’d prefer to avoid these kinds of situations where possible at high school level.
  4. You need to define and measure variables from each member of the sample/population. We want students working with multivariate data sets, with several options possible for numerical and categorical data (but don’t forget there is the option to create new variables from what was measured).

I’ll try to refer back to these four considerations/requirements when I discuss examples in the posts that will follow.

Just one very relevant NZ NCEA assessment-specific comment before we talk data. For AS91035 and AS91582, the standards state that students are to be provided with the sample multivariate data for the task – so all of (1) (2) (3) and (4) is done by the teacher. Similarly with AS91264, the requirement for the standard is that students select a random sample (3) from a provided population dataset – so (1) (2) and (4) are done by the teacher. This does not mean the students can’t do more in terms of the sampling/collecting processes, just that these are not requirements for the standards and asking students to do more should not limit their ability to complete the task. I’ll try to give some ideas for how to manage any related issues in the examples.

Just one more point. I haven’t made this (5) in the previous section, but something to watch out for is the nature of your “cases”. Tables of data (which we refer to as datasets) that play nicely with statistical software like iNZight are ones where the data is organised so that each row is a case and each column is a variable. Typically at high school level, the datasets we use are ones where each case (e.g. each individual in the defined population) is measured directly to obtain different variables. Things can get a little tricky conceptually when some of the variables for a case are actually measured by grouping/aggregating related but different cases.

For example, if I take five movies from the internet movie database that have “dog” in the title (imdb.com) and another five with “cat” in the title, I could construct a mini dataset like the one below using information from the website:

For this dataset, each row is a different movie, so the cases are the movies. Each column provides a different variable for each movie. The variables Movie title, Year released, Movie length mins, Average rating, Number of ratings, Number photos and Genre were taken straight from the webpage for each movie. I created the variables Number words title, Number letters title, Average letters per word, Animal in title, Years since release and Millennium. [Something I won’t tackle in this post is what to do about the Genre variable to make this usable for analysis.]

The Average rating variable looks like a nice numerical variable to use, for example, to compare ratings of these movies with “dog” in the title and those with “cat”. The thing is, this particular variable has been measured by aggregating individual’s ratings of the movie using a mean (the related but different cases here are the individuals who rated the movies). You can see why this may be an issue when you look at the variable Number of ratings, which again is an aggregate measure (a count) – some of these movies have received less than 200 ratings while others are in the hundreds of thousands. We also can’t see what the distribution of these individual ratings for each movie looks like to decide whether the mean is telling us something useful about the ratings. [For some more really interesting discussion of using online movie ratings, check out this fivethirtyeight article.]

The variable Average letters per word has been measured directly from each case, using the characteristics of the movie title. There are still some potential issues with using the variable Average letters per word as a measure of, let’s say, complexity of words used in the movie title, since the mean is being used, but at least in this case students can see the movie title.

Another example of case awareness can be seen in the mini dataset below, using data on PhD candidates from the University of Auckland online directory:

For this dataset, each row is a different department, so the cases are the departments. Each column provides a different variable for each department. Gender was estimated based on the information provided in the directory and the data may be inaccurate for this reason. The % of PhD candidates that are female looks like a nice numerical variable to use, for example, to compare gender rates between these departments from the Arts and Science faculties. Generally with numerical variables we would use the mean or median as a measure of central tendency. But this variable was measured by aggregating information about each PhD candidate in that department and presenting this measure as a percentage (the related but different cases here are the PhD candidates). Just think about it, does it really make sense to make a statement like: The mean % of PhD candidates that are female for these departments of the Arts faculty is 73% whereas the mean % of PhD candidates that are female for these departments of the Science faculty is 44%, especially when the numbers of PhD candidates varies so much between departments?

Looking at the individual percentages is interesting to see how they vary across departments, but combining them to get an overall measure for each faculty should involve calculating another percentage using the original counts for PhD candidates for each department (e.g. group by faculty). If I want to compare gender rates between the Arts and Science faculties for PhD candidates, I would calculate the proportion of all PhD candidates across these department that are female for each faculty e.g. 58% of the PhD candidates from these departments of the Arts faculty are female, 53% of the PhD candidates from these departments of the Science faculty are female.

[If you’d like to read more about structuring data in the context of creating a dataset, then check out this excellent post by Rob Gould.]

Where to next?

This post was not supposed to deter you from finding and creating your own real datasets! But we do need to think carefully about the data that we provide to students, especially our high school students. Not all datasets are the same and while I’ve seen some really cool and interesting ideas out there for finding/collecting data for investigations, some of these ideas unintentionally produce data that makes it very difficult for students to engage with the core question: What can and can’t I say about the population(s) based on the random sample data? 

In the next post, I’ll discuss some examples of finding real data online. Until I find time to write this next post, check out these existing data finding posts:

Using awesome real data

Cat and whisker plots: sampling from the Quick, Draw! dataset

The power of pixels: Modelling with images

The power of pixels: Modelling with images

This post provides the notes for the plenary I gave for the Auckland Mathematical Association (AMA) about using images as a source of data for teaching statistical investigations.

You might be disappointed to find out that my talk (and this post) is not about the movie pixels, as my husband initially thought it was. It’s probably a good thing I decided to focus on pixels in terms of data about a computer or digital image, as the box office data about pixels the movie suggests that the movie didn’t perform so well 🙂 Instead for this talk I presented some examples of using images as part of statistical investigations that (hopefully) demonstrated how the different combinations of humans, digital technologies, and modelling can lead to some pretty interesting data. The abstract for the talk is below:

How are photos of cats different from photos of dogs? How could someone determine where you come from based on how you draw a circle? How could the human job of counting cars at an intersection be cheaply replaced by technology? I will share some examples of simple models that I and others have developed to answer these kinds of questions through statistical investigations involving the analysis of both static and dynamic images. We will also discuss how the process of creating these models utilises statistical, mathematical and computational thinking.

As I was using a photo of my cat Elliot to explain the different ways we can use images to collect data, a really funny thing happened (see the embedded tweet below).

Yes, an actual real #statscat appeared in the room! What are the chances of that? 🙂

Pixels are the squares of colour that make up computer or digital (raster) images. Each image has a certain number of pixels e.g. an image that is 41 pixels in width and 15 pixels in height contains 615 pixels, which is an obvious link to concepts of area. The 615 pixels are stored in an ordered list, so the computer knows how to display them, and each pixel contains information about colour. Using RGB colour values (other systems exist), each pixel contains information about the amounts of red, green and blue on a scale of 0 to 255 inclusive. To get at the information about the pixels is going to require some knowledge of digital technologies, and so the use of images within statistical investigations can be a nice way to teach objectives from across the different curriculum learning areas.

Using images as a source of data can happen on at least three levels. Using the aforementioned photo of my cat Elliot, humans could extract data from the image by focusing on things they can see, for example, that that image is a black and white photo and not in colour, that there are two cats in the photo, and that Elliot does not appear to be smiling. Data that is also available about the image using digital tech includes variables such as the number of pixels, the file type and the file size. Data that can be generated using models related to this image could be identifying the most prominent shade of grey, the likelihood this photo will get more than 100 likes on instagram and what the photo is of (cat vs dog for example, a popular machine learning task).

Static images

The first example used the data, in particular the photos, collected as part of the ongoing data collection project I have running about cats and dogs (the current set of pet data cards can be downloaded here). As humans, we can look at images, notice things that are different and these features can be used to create variables. For example, if you look at some of the photos submitted: some pets are outside while others are inside; some pets are looking at the camera while others are looking away from the camera; and some are “close ups” while others taken from a distance.

These potential variables are all initially categorical, but by using digital technologies, numerical variables are also possible. To create a measure of whether a photo is a “close up” shot of a pet, the area the pet takes up of the photo can be measured. This is where pixels are super helpful. I used paint.net, free image editing software, to show that if I trace around the dog in this photo using the lasso tool that the dog makes up about 61 000 pixels. If you compare this figure to the total number of pixels in the image (90 000), you can calculate the percentage the dog makes up of the photo.

For the current set of pet data card, each photo now has this percentage displayed. Based on this very small sample of six pets, it kind of looks like maybe cats typically make up a larger percentage of the photo than dogs, but I will leave this up to you to investigate using appropriate statistical modelling 🙂

For a pretty cool example of using static images, humans, digital technologies and models, you should take a look at how-old.net. As humans, we can look at photos of people and estimate their age and compare our estimates to people’s actual ages. What how-old.net has done is used machine learning to train a model to predict someone’s age based on the features of the photo submitted. I asked teachers at the talk to select which of the three photos they thought I looked the youngest in (most said B), which is the same photo that the how-old.net model predicted I looked the youngest in. A good teaching point about the model used by how-old.net is that it does get updated, as new data is used to refine its predictions.

You can also demonstrate how models can be evaluated by comparing what the model predicts to the actual value (if known). Fortunately I have a large number of siblings and so a handy (and frequently used) range of different aged people to test the how-old.net model. Students could use public figures, such as athletes, politicians, media personalities or celebrities, to compare each person’s actual age to what the model predicts (since it’s likely that both photos and ages are available on the internet).

There is also the possibility of setting up an activity around comparing humans vs models – for the same set of photos, are humans better at predicting ages than how-old.net? Students could be asked to consider how they could set up this kind of activity, what photos could they use, and how would they decided who was better – humans or models?

Drawings

The next example used the set of drawings Google has made available from their Quick! Draw! game and artificial intelligence experiment. I’ve already written a post about this data set, so have a read of that post if you haven’t already 🙂 In this talk, I asked teachers to draw a quick sketch of cat and then asked them to tell me whether they drew just the face, or the body as well (most drew the face and body – I’m not sure if the appearance of an actual cat during the talk influenced this at all!) I also asked them to think about how many times they lifted their pen off the paper. I probably forgot to say this at the time, but for some things humans are pretty good at providing data but for others, digital technologies are better. In the case of drawing and thinking about how many strokes you made while drawing, we would get more accurate data if we could measure this using a mouse, stylus or touchscreen than asking people to remember.

Using the random sampler tool that I have set up that allows you to choose one of the objects players have been asked to draw for Quick! Draw!, I generated a random sample of 200 of the drawings made when asked to draw a cat. The data the can be used from each drawing is a combination of what humans and digital technologies can measure. The drawing itself (similar to the photos of pets in the first example) can be used to create different variables, for example whether the sketch is of the face only, or the face and body. Other variables are also provided, such as the timestamp and country code, both examples of data that is captured from players of the game without them necessarily realising (e.g. digital traces).

After manually reviewing all 200 drawings and recording data about the variables, I used iNZight VIT to construct bootstrap confidence intervals for the proportion of all drawings made of cats in the Quick! Draw! dataset that were only of faces and for the difference between the mean number of strokes made for drawings of cats in the Quick! Draw! dataset that were of bodies and mean number of strokes made for drawings of cats in the Quick! Draw! dataset that were of faces. Interestingly, while the teachers at the talk mostly drew sketches of cats with bodies, most players of Quick! Draw! only sketch the faces of cats. This could be due to the 20 second time limit enforced when playing the game. It makes sense that the, on average, Quick! Draw! players use more strokes to draw cats with bodies versus cats with just faces. I wished at the time that I had also recorded information about the other variables provided for each drawing, as it would have been good to further explore how the drawings compare in terms of whether the game correctly identified more of the face-only drawings of cats than the body drawings.

What is also really interesting is the artificial intelligence aspect of the game. The video below explains this pretty well, but basically the model that is used to guess what object is being drawn is trained on what previous players of the game have drawn.

From a maths teachers perspective, this is a good example of what can go wrong with technology and modelling. For example, players are asked to draw a square, and because the model is trained on how they draw the object, players who draw four lines that are roughly perpendicular behave similarly from the machine’s perspective because the technology is looking for commonalities between the drawings. What the technology is not detecting is that some players do not know what a square is, or think squares and rectangles are the same thing. So the data being used to train the model is biased. The consequence of this bias is that the model will now reinforce players misunderstanding that a rectangle is a square by “correctly” predicting they are drawing a square when they draw a rectangle! An interesting investigation I haven’t done yet would be to estimate what percentage of drawings made for squares are rectangles 🙂 I would also suggest checking out some of the other “shape” objects to see other examples e.g. octagons.

Using a more complex form of the Google Quick! Draw! dataset, Thu-Huong Ha and Nikhil Sonnad analysed over 100 000 of the drawings made of circles to show how language and culture influences sketches. For example, they found that 86% of the circles drawn by players in the US were drawn counter clockwise, while 80% of the circles drawn by players in Japan were drawn clockwise. To me, this is really fascinating stuff, and really cool examples of how using images as a source of data can result in really meaningful investigations about the world.

Animation

The last example I used was about using videos as a source of data for probability distribution modelling activities. I’ve presented some workshops before where I used a video (traffic.mp4) from a live streaming traffic camera positioned above a section of the motorway in Wellington. Focusing on the lane of traffic closest to the front of the screen, I got teachers to count how many cars arrived to a fixed point in that lane every five seconds. This gave us a nice set of data which we could then use to test the suitability of a Poisson distribution as a model.

For this talk, I wanted to demonstrate how humans could be replaced (potentially) by digital technologies and models. Since the video is a collection of images shown quickly (around 50 frames per second), we can use pixels, or potentially just a single pixel, in the images to measure various attributes of the cars. About a year ago, I set myself the challenge of exploring whether it would be possible to glean information about car counts, car colours etc. and shared my progress with this personal project at the end of the talk.

So, yes there does exist pretty fancy video analysis software out there that I could use to extract the data I want, but I wanted to investigate whether I could use a combination of statistical, mathematical and computational thinking to create my own model to generate the data. As part of my PhD, I’m interesting in finding out what activities could help introduce students to the modern art and science of learning from data, and what is nice about this example is that idea of how the model could count how many cars are arriving every five seconds to a fixed point on the motorway is actually pretty simple and so potentially a good entry point for students.

The basic idea behind the model is that when there are no cars at the point on the motorway, the pixel I am tracking is a certain colour. This colour becomes my reference colour for the model. Using the RBG colour system, for each frame/image in the traffic video, I can compare the current colour of the pixel e.g. rgb(100, 250, 141) to the reference colour e.g. rgb(162, 158, 162). As soon as the colour changes from the reference colour, I can infer this means a car has arrived to the point on the motorway. And as soon as the colour changes back to the reference colour, I can infer that the car has left the point on the motorway. While the car is moving past the point, I can also collect data on the colour of the pixel from each frame, and use this to determine the colour of the car.

I’m still working on the model (in that I haven’t actually modified it since I first played around with the idea last year) and the video below shows where my project within CODAP (Common Online Data Analysis Platform) is currently at. When I get some time, I will share the link to this CODAP data interactive so you and your students can play around with choosing different pixels to track and changing other parameters of the model I’ve developed 🙂

You might notice by watching this video that the model needs some work. The colours being recorded for each car are not always that good (average colour is an interesting concept in itself, and I’ve learned a lot more about how to work with colour since I developed the model) and some cars end up being recorded twice or not at all. But now that I’ve developed an initial model to count the cars that arrive every five seconds, I can compare the data generated from the model to the data generated by humans to see how well my model performed.

You can see at the moment, that the data looks very different when comparing what the humans counted and what the digital tech + model counted. So maybe the job of traffic counter (my job during university!) is still safe – for now 🙂

Going crackers

I didn’t get time in the talk to show an example of a statistical investigation that used images (photos of animal crackers or biscuits) to create a informal prediction model. I’ll write about this in another post soon – watch this space!

Helping students to estimate mean and standard deviation

Estimating the mean and standard deviation of a discrete random variable is something we expect NZ students to be able to do by the time they finish Year 13 (Grade 12). The idea is that students estimate these properties of a distribution using visual features of a display (e.g. a dot plot) and, ideally, these measures are visually and conceptually attached to a real data distribution with a context and not treated entirely as mathematical concepts.

At the start of this year I went looking for an interactive dot plot to use when reviewing mean and standard deviation with my intro-level statistics students. Initially, I wanted something where I could drag dots around on a dot plot and show what happens to the mean, standard deviation etc. as I do this. Then I wanted something where you could drag dots on and off the dot plot, rather than having an initial starting dot plot, so students could build dot plots based on various situations. I came across a few examples of interactive-ish dot plots out there in Google-land but none quite did what I wanted (or kept the focus on what I wanted), so I decided to write my own. [Note: CODAP would have been my choice if I had just wanted to drag dots around. Extra note: CODAP is pretty awesome for many many reasons].

In my head as I developed the app was an activity I’ve used in the past to introduce standard deviation as a measure – Exploring statistical measures by estimating the ages of famous people – as well as a workshop by the awesome Christine Franklin. For NZ-based teachers (or teachers who want to come to beautiful New Zealand for our national mathematics teachers conference), Chris is one of the keynote speakers at the NZAMT 2017 conference and is running a workshop at this conference called Conceptualizing Variation from the Mean: Evolving from ‘Number of Steps’ to the ‘SAD’ to the ‘MAD’ to the ‘Standard Deviation’  which you should get along to if you can. Also in my head was the idea of the mean of a distribution being like the “balancing point”, and other activities I have used in the past based on this analogy and also see-saws! My teaching colleague Liza Bolton was also super helpful at listening to my ideas, suggesting awesome ones of her own, and testing the app throughout its various versions.

dots – an interactive dot plot

You can access dots at this address: learning.statistics-is-awesome.org/dots/ but you might want to keep reading to find out a little more about how it works 🙂 Below is a screenshot of the app, with some brief descriptions of how things are supposed to work. Current limitations for dots are that no more than 35 dots will be displayed, the axis is fixed between 0 and 34, and that dots can only be placed on whole numbers. I had played around with making these aspects of the app more flexible, but then decided not to pursue this as I’m not trying to re-create graphing/statistical software with this interactive.

Since I’ve got the It’s raining cats and dogs (hopefully) project running, I thought I’d use some of the data collected so far to show a few examples of how to use dots. [Note: The data collection phase of the cats and dogs data cards project is still running, so you can get your students involved]. Here are 15 randomly selected cats from the data cards created so far, with the age of each cat removed.

Once you get past how cute these cats are, what do you think the mean age of these cats is (in years)? Can you tell which cat is the oldest? How much variation do you think there is between the ages of these cats?

Dragging dots onto the dot plot

A dot plot can be created by dragging dots on to the plot (don’t forget to add a label for the axis like I did!)

 

Sending data to the dot plot

You can also add the data and the label to the URL so that the plot is ready to go. Use the structure shown below to do this, and then click on the link to see the ages of these cats on the interactive dot plot.

learning.statistics-is-awesome.org/dots/#data=7,1,12,16,4,2,11,8,4,9,5,2,3,1,17&label=ages_of_cats_in_years

Turns out China is the oldest cat in this sample.

Exploring the balance point

You can click below the dots on the axis to indicate your estimate for the mean. You could do a couple of things after this. You could click the Mean button to show the mean, and check how this compares to your estimated mean. Or you could click the Balance test button to turn in on (green), and see how well the dots balance on the point you have estimated as the mean (or both like I did).

 

Estimating standard deviation

Estimating standard deviation is hard. I try not to use “rules” that only work with Normally distributed-ish data (like take the range and divide by six) and aren’t based on what the standard deviation is a measure of. Visualising standard deviation is also a tricky thing. In the video below I’ve gone with two approaches: one uses a Chrome extension Web Paint to draw on the plot where I think is the average distance each dot is from the mean and one uses the absolute deviations.

 

Using “random distribution”

This is the option I have used the most when working with students individually. Yes, there is no context when using this option, but in my conversations with students when talking about the mean and standard deviation I’m not sure the lack of context makes it non-conceptual-building activity. The short video below shows using the median as a starting point for the estimate of the mean, and the adjusting from here depending on other features of the distribution (e.g. shape). The video ends by dragging a dot around to see what happens to the different measures, since that was the starting point for developing dots 🙂

 

Other ideas for using dots?

Share them below the related Facebook post, on Twitter, or wherever – I’d be super keen to hear whether you find this interactive dot plot useful for teaching students how to estimate mean and standard deviation 🙂

PS no cats were harmed in the making of this GIF

Which one doesn’t belong …. for stats?

If you haven’t heard of the activity Which one doesn’t belong? (WODB), it involves showing students four “things” and asking them to describe/argue which one doesn’t belong. There are heaps of examples of Which one doesn’t belong? in action for math(s) on the web, Twitter, and even in a book. From what I’ve seen, for math(s) I think the activity is pretty cool. In terms of whether WODB works for stats, however, I’m not so sure. Perhaps for definitions, facts, static pieces of knowledge it could work (?), but in terms of making comparisons involving data and its various representations (including graphs/displays), I need more convincing. There’s something different between comparing properties of shapes (for example), which remain fixed, and comparing data about something/someone, which could vary.

For example, What cat doesn’t belong? for the four “stats cats” data cards shown below.

To make comparisons between the four cats means to reason with data, but if I am considering only the data provided in these four data cards then these comparisons are made without uncertainty. For example, I can say definitively, for these four cats, that:

  • Elliot is the only cat with a name that has three syllables,
  • Molly is the only female cat,
  • Joey is the only cat is both an inside and outside cat,
  • Classic is the only cat that uses a cat door.

I could argue many different cases for which cat (or photo) does not belong. This is all cool, but doesn’t feel like statistics to me. Statistics is all about using data to make decisions in the face of uncertainty, by appreciating different sources of variation and considering how to deal with these. In particular, inferential reasoning involves going beyond the data at hand, thinking about generalisability, considering the quality and quantity of data available, and appreciating/communicating the possibility of being wrong not matter how “right” the methodology.

So while I appreciate that WODB allows for “not just one correct answer” and the development of argumentation skills, I’d be more happier if this kind of activity within statistics teaching led to the posing of statistical investigative questions (SIQ): WODB->SIQ. Why? We need more data and more of an idea of where the data came from to really answer the really interesting questions that comparing these four cats might provoke us to consider. We need students to feel the uncertainty that comes from thinking and reasoning statistically and to help students find ways to deal with this uncertainty. We also need students to care about the questions being asked of the data – my worry here is that otherwise the question students might ask when using WODB is Who cares which one doesn’t belong? 🙂

Questions I have when looking at these stats cats data cards, which are interesting to me are: I wonder …. How many syllables do cats’ names have? Do most cats have two syllable names? Is Elliot (my cat!) an unusual name for this reason? Do I spend too much on cat food ($NZD30 per week)? Or maybe black cats are more expensive to feed? I won’t be able to get definitive answers to these questions, but by collecting more data and investigating these questions using statistical methods I can get a better understanding of what could be plausible answers.

PS Want some of these data cards? Head here –> It’s raining cats and dogs (hopefully)

Statistical reasoning with data cards (webinar)

UPDATE: The video of the webinar is now available here.

I’m super excited to be presenting the next ASA K-12 Statistics Education Webinar. The webinar is based on one of my sessions from last year’s Meeting Within a Meeting (MWM) and will be all about using data cards featuring NZ data/contexts. I’ll also be using the digital data cards featured in my post Initial adventures in Stickland if you’d like to see these in “teaching action”.

The webinar is scheduled for Thursday April 20 9:30am New Zealand Time (Wednesday April 19 at 5:30 pm Eastern Time, 2:30 pm Pacific), but if you can’t watch it live a video of the webinar will be made available after the live presentation 🙂

Here are all the details about the webinar:

Title: Statistical Reasoning with Data Cards

Presenter: Anna-Marie Fergusson, University of Auckland

Abstract: Using data cards in the teaching of statistics can be a powerful way to build students’ statistical reasoning. Important understandings related to working with multivariate data, posing statistical questions, recognizing sampling variation and thinking about models can be developed. The use of real-life data cards involves hands-on and visual-based activities. This talk will present material from the Meeting Within a Meeting (MWM) Statistics Workshop held at JSM Chicago (2016) which can be used in classrooms to support teaching within the Common Core State Standards for Mathematics. Key teaching and learning ideas that underpin the activities will also be discussed.

To RSVP to participate in the live webinar, please use the following link: https://goo.gl/forms/pQ5taydWwOZy2WOJ3

The ASA is offering this webinar without charge and only internet and telephone access are necessary to participate. This webinar series was developed as part of the follow-up activities to the Meeting Within a Meeting (MWM) Workshop for Math and Science teachers held in conjunction with the Joint Statistical Meetings (www.amstat.org/education/mwm). MWM will be held again in Baltimore, MD on August 1-2, 2017.  For those unavailable to participate in the live webinar, ASA will record this webinar and make it available after the live presentation. Previous webinar recordings are available at http://www.amstat.org/asa/education/K-12-Statistics-Education-Webinars.aspx.

Using data and simulation to teach probability modelling

This post provides the notes and resources for a workshop I ran for the Auckland Mathematical Association (AMA) on using data and simulation to teach probability modelling (specifically AS91585/AS91586). This post also includes notes about a workshop I ran for the AMA Statistics Teachers’ Day 2016 about my research into this area.

Using data in different ways

The workshop began by looking at three different questions from the AS91585 2015 paper. What was similar about all three questions was that they involved data, however, how this data was used with a probability model was different for each question.

For the first question (A), we have data on a particular shipment of cars: we know the proportion of cars with petrol cap on left-hand side of the car and the percentage of cars that are silver. We are then told that one of the cars is selected at random, which means that we do not need to go beyond this data to solve the problem. In this situation, the “truth” is the same as the “model”. Therefore, we are finding the probability.

For the second question (B), we have data on 10 cars getting petrol: we know the proportion of cars with petrol caps on the left-hand side of the car. However, we are asked to go beyond this data and generalise about all cars in NZ, in terms of their likelihood of having petrol caps on the left-hand side of the cars. This requires developing a model for the situation. In this situation, the “truth” is not necessarily the same as the “model”, and we need to take into account the nature of the data (amount and representativeness) and consider assumptions for the model (the conditions, the model applies IF…..). Therefore, when we use this model we are finding an estimate for the probability.

For the third question (C), we have data on 20 cars being sold: we know the proportion of cars that have 0 for the last digit of the odometer reading (six). What we don’t know is if observing six cars with odometer readings that end in 0 is unusual (and possibly indicative of something dodgy). This requires developing a model to test the observed data (proportion), basing this model on an assumption that the last digit of an odometer reading should just be explained by chance alone (equally likely for each digit). Therefore, when we use this model, we generate data from the model (through simulation) and use this simulated data to estimate the chance of observing 6 (or more) cars out of 20 with odometer readings that end in 0. If this “tail proportion” is small (less than 5%), we conclude that chance was not acting alone.

There’s a lot of ideas to get your head around! Sitting in there are ideas around what probability models are and what simulations are (see the slides for more about this) and as I discovered during my research last year with teachers and probability distribution modelling, these ideas may need a little more care when defining and using with students. The main reason I think we need to be careful using data when teaching probability modelling is because it matters whether you are using data from a real situation, where you do not know the true probability, or whether you are using data that you have generated from a model through simulation. Each type of data tells you something different and are used in different ways in the modelling process. In my research, this led to the development of the statistical modelling framework shown below:

All models are wrong but some are more wrong than others: Informally testing the fit of a probability distribution model

At the end of 2016, I presented a workshop at the AMA Statistics Teachers’ Day based on my research into probability distribution modelling (AS91586). This 2016 workshop also went into more detail about the framework for statistical modelling I’m developing. The video for this workshop is available here on Census At School NZ.

We have a clear learning progression for how “to make a call” when making comparisons, but how do we make a call about whether a probability distribution model is a good model? As we place a greater emphasis on the use of real data in our statistical investigations, we need to build on sampling variation ideas and use these within our teaching of probability in ways that allow for key concepts to be linked but not confused. Last year I undertook research into teachers’ knowledge of probability distribution modelling. At this workshop, I shared what I learned from this research, and also shared a new free online tool and activities I developed that allows students to informally test the fit of probability distribution models.

During the workshop, I showed a live traffic camera from Wellington (http://wixcam.citylink.co.nz/nph-webcam.cgi/terrace-north), which was the context for a question developed and used (the starter question AKA counting cars). Before the workshop, I recorded five minutes of the traffic and then set up a special html file that pauses the video every five seconds. This was so teachers at the workshop (and students) could count the number of cars passing different points on the motorway (marked with different coloured lines) every five seconds. To use this html file, you need to download both of these files into the same folder – traffic.html and traffic.mp4. I’ve only tested my files using the Chrome browser 🙂

If you don’t want to count the cars yourself, you can head straight to the modelling tool I developed as part of my research: http://learning.statistics-is-awesome.org/modelling-tool/. In the dropdown box under “The situation” there are options for the different coloured points/lines on the motorway. The idea behind getting teachers and students to actually count the cars was to try to develop a greater awareness of the complexity of the situation being modelled, to reinforce the idea that “all models are wrong” – that they are approximations of reality but not the truth. Also, I wanted to encourage some deeper thinking about limitations of models. For example, in this situation, looking at five second periods, there is an upper limit on how many cars you can count due to speed restrictions and following distances. We also need to get students to think more about model in terms of sample space (the set of possible outcomes) and the shape of the distribution (which is linked to the probabilities of each of these outcomes), not just the conditions for applying the probability distribution 🙂

In terms of the modelling tool, I developed a set of teaching notes early last year, which you can access in the Google drive below. This includes some videos I made demonstrating the tool in action 🙂 I also started developing a virtual world (stickland http://learning.statistics-is-awesome.org/stickland-modelling/) but this is still a work in progress. Once you have collected data on either the birds or the stick people, you can copy and paste it into the modelling tool. There will be more variables to collect data on in the future for a wider range of possible probability distributions (including situations where none is applicable).

Slides from IASC-ARS/NZSA 2017 talk

https://goo.gl/dfA9MF

Resources for workshop (via Google Drive)

Developing learning and formative assessment tasks for evaluating statistically-based reports

This post provides the notes and resources for a workshop I ran for the Auckland Mathematical Association (AMA) on developing learning and formative assessment tasks for evaluating statistically-based reports (specifically AS91584).

Notes for workshop

The starter task for this workshop was based around a marketing leaflet I received in my letterbox for a local school back in 2014. I was instantly skeptical about the claims being made by the school and went straight to sources of public data to check the claims. As was often the case, this personal experience turned into an activity I used with my Scholarship Statistics students to help them develop their critical evaluation skills. The task, public data I used, and my attempt at answers (from my past self in 2014) are provided at the bottom of this post. My overall conclusion was that most of the claims check out until around 2011, but not so much for 2012 – 2013, leading my to speculate that the school had not updated their marketing leaflet. The starter task is all about claims and data, and not so much about statistical processes, study design, or inferential reasoning – all of which are required for students to engage with the evaluation of statistically-based reports. However, I used this task to set the focus of the workshop, which was to focus on the claims that are being made, and whether they can be supported or not, and why.

The questions used for the external assessment tasks for AS91584 (available here) are designed to help scaffold students to critique the report in terms of the claims, statements or conclusions made within the report. Students need to draw on what has been described in the report and relevant contextual and statistical knowledge to write concise and clear discussion points that show statistical insight and answer the questions posed. This is hard for students. Students find it easy to write very creative, verbose and vague responses, but harder to write responses that are not based only on speculation or that are not rote learned. We see this difficulty with internally assessed tasks as well, so it’s not that surprising that students struggle to write concise, clear, and statistically insightful discussion points under exam pressure.

Teachers who I have spoken to who have taught this standard (which includes me) really enjoy teaching statistical reports to students. In reflections and conversations with teachers on how we could further improve the awesome teaching of statistical reports, a few ideas or suggestions emerged:

  • Perhaps we focus our teaching too much on content, keeping aspects such as margin of errors and confidence intervals, observational studies vs experiments, and non-sampling errors too separate?
  • Perhaps we focus too much on “good answers” to questions about statistical reports, rather than “good questions” to ask of statistical reports?

Great ideas for teaching statistical report can be sourced from Census at School NZ or from conversations with “statistical friends” (see the slides for more details). These include ideas such as: experiencing the study design first and then critiquing a statistical report that used a similar design, using matching cards to build confidence with different ideas, keeping a focus on the statistical inquiry cycle, teaching statistical reports through the whole year rather than in one block, and teaching statistical reports alongside other topics such as time series, bivariate analysis, and bootstrapping confidence intervals. I quite like the idea of the “seven deadly sins” of statistical reports, but didn’t quite have enough time to develop what these could be before the workshop – feel free to let me know if you come up with a good set! [Update: Maybe these work or could be modified?]

When I taught statistical reports in 2013 (the first year of the new achievement standard/exam), I was gutted when I got my students’ results back at the start of 2014.  I reflected on my teaching and preparation of students for the exam and realised I had been too casual about teaching students how to respond to questions. In particular, I had expected my “good” students would gain excellence (the highest grade – showing statistical insight) because they had gained excellences for the internally-assessed students or were strong contenders to get a Scholarship in Statistics. So, a bit later in 2014, when the assessment schedules came out, I looked carefully at what had been written as expected responses. To me, it seemed that a good discussion point had to address three questions: What? Why? How? Depending on the question being asked, the whats, whys and hows were a bit different, but at the time (only having one exam and schedule to go with!) it seemed to make sense. At least, in my teaching that year with students, I felt that using this simple structure allowed me to teach and mark discussion points more confidently. You can see more details for this “discussion point” structure in the slides.

The last part of the workshop involved providing teachers with one of three statistical reports (all around the theme of coffee of course!) and asking them, in groups, to develop a formative assessment task. After identifying one or two key claims made in the report, they had to select three or four questions from previous year’s exams that would be relevant for questioning the report in front of them (relevant to the conclusions made in the report). We didn’t quite get this finished in the workshop – the goal was to create three formative assessment tasks that could be shared! However, perhaps some of the teachers who attended the workshop will go on to develop formative assessment tasks and email these to me to share at a later date. I do feel strongly that all teachers of statistics should feel confident to write their own formative or practice assessment tasks for whatever they are teaching – if you’re not sure about what understanding you are trying to assess and what questions to ask to assess that understanding, how do you feel confident with what to teach? I’m hoping to launch a project next term to help support statistics teachers to feel more confident with writing formative assessment tasks, so watch this space 🙂

Resources for workshop (via Google Drive)