You say data, I say data cards …

This long weekend (in Auckland anyway!), I spent some time updating the Quick! Draw! sampling tool (read more about it here Cat and whisker plots: sampling from the Quick, Draw! dataset). You may need to clear your browser cache/data to see the most recent version of the sampling tool.

One of the motivations for doing so was a visit to my favourite kind of store – a stationery store – where I saw (and bought!) this lovely gadget:

It’s a circle punch with a 2″/5 cm diameter. When I saw it, my first thought was “oh cool I can make dot-shaped data cards”, like a normal person right?

Using data cards to make physical plots is not a new idea – see censusatschool.org.nz/resource/growing-scatterplots/ by Pip Arnold for one example:

But I haven’t seen dot-shaped ones yet, so this led me to re-develop the Quick! Draw! sampling tool to be able to create some 🙂

I was also motivated to work some more on the tool after the fantastic Wendy Gibbs asked me at the NZAMT (New Zealand Association of Mathematics Teachers) writing camp if I could include variables related to the times involved with each drawing. I suspect she has read this super cool post by Jim Vallandingham (while you’re at his site, check out some of his other cool posts and visualisations) which came out after I first released the sampling tool and compares strokes and drawing/pause times for different words/concepts – including cats and dogs!

So, with Quick! Draw! sampling tool you can now get the following variables for each drawing in the sample:

The drawing and pause times are in seconds. The drawing time captures the time taken for each stroke from beginning to end and the pause time captures all the time between strokes. If you add these two times together, you will get the total time the person spent drawing the word/concept before either the 20 seconds was up, or Google tried to identify the word/concept. Below the word/concept drawn is whether the drawing was correctly recognised (true) or not (false).

I also added three ways to use the data cards once they have been generated using the sampling tool (scroll down to below the data cards). You can now:

  1. download a PDF version of the data cards, with circles the same size as the circle punch shown above (2″/5cm)
  2. download the CSV file for the sample data
  3. show the sample data as a HTML table (which makes it easy to copy and paste into a Google sheet for example)

In terms of options (2) and (3) above, I had resisted making the data this accessible in the previous version of the sampling tool. One of the reasons for this is because I wanted the drawings themselves to be considered as data, and as human would be involved in developed this variable, there was a need to work with just a sample of all the millions of drawings. I still feel this way, so I encourage you to get students to develop at least one new variable for their sample data that is based on a feature of the drawing 🙂 For example, whether the drawing of a cat is the face only, or includes the body too.

There are other cool things possible to expand the variables provided. Students could create a new variable by adding drawing_time and pause_time together. They could also create a variable which compares the number_strokes to the drawing_time e.g. average time per stroke. Students could also use the day_sketched variable to classify sketches as weekday or weekend drawings. Students should soon find the hemisphere is not that useful for comparisons, so could explore another country-related classification like continent. More advanced manipulations could involve working with the time stamps, which are given for all drawings using UTC time. This has consequences for the variable day_sketched as many countries (and places within countries) will be behind or ahead of the UTC time.

If you’ve made it this far in the post…. why not play with a little R 🙂

I wonder which common household pet Quick! drawers tend to use the most strokes to draw? Cats, dogs, or fish?

Have a go at modifying the R code below, using the iNZightPlots package by Tom Elliott and my [very-much-in-its-initial-stages-of-development] iNZightR package, to see what we can learn from the data 🙂 If you’re feeling extra adventurous, why not try modifying the code to explore the relationship between number of strokes and drawing time!

It’s raining cats and dogs (hopefully)

In April 2017, I presented an ASA K-12 statistics education webinar: Statistical reasoning with data cards (webinar). Towards the end of the webinar, I encouraged teachers to get students to make their own data cards about their cats. A few days later, I then thought that this could be something to get NZ teachers and students involved with. Imagine a huge collection of real data cards about dogs and cats? Real data that comes from NZ teachers and students? Like Census At School but for pets 🙂 I persuaded a few of my teacher friends to create data cards for their pets (dogs or cats) and to get their students involved, to see whether this project could work. Below is a small selection of the data cards that were initially created (beware of potential cuteness overload!)

The project then expanded to include more teachers and students across NZ, and even the US, and I’ve now decided to keep the data card generator (and collection) page open so that the set of data cards can grow over time. Please use the steps below to get students creating and sharing data cards about their pets.

Creating and sharing data cards about dogs and cats

Inevitably, there will be submissions made that are “fake”, silly or offensive (see below).

Data cards submitted to the project won’t automatically be added to any public sets of data cards, and will be checked first. Just like with any surveying process that is based on self-selection, is internet based and relies on humans to give honest and accurate answers, there is the potential for non-sampling errors. To help reduce the quantify of “fake” data cards, if you are keen to have your students involved with this project it would be great if you could do the following:

1. Talk to your students about the project and explain that the data cards will be shared with other students. They will be sharing information about their pet and need to be OK with this (and don’t have to!). The data will be displayed with a picture of their pet, so participation is not strictly anonymous. All of this is important to discuss with students as we need to educate students about data privacy 🙂

2. When students submit their data, they are given the finished data card which they can save. Set up a system where students need to share the data card they have created with you e.g. by saving into a shared Google drive or Dropbox, or by emailing the data card to you. The advantage for you of setting up this system is that you get your class/school set of data cards to use however you want. The advantage for me is that this level of “watching” might discourage silly data cards being created.

3. Share this link with your students http://learning.statistics-is-awesome.org/dogsvscats/ and let the rain of cats and dogs begin!

Pet data cards

The data collection period for this set of data cards was 1 May 17 to 19 May 17.

The diagram below shows the data included on each data card:


Additional data that could be used from each data card includes:

  • Whether the pet photo was taken inside or outside
  • Whether the pet photo is rotated (and the angle of rotation)
  • The number of letters in the pet name
  • The number of syllables in the pet name

PDF of all data cards: click to download

 

Which one doesn’t belong …. for stats?

If you haven’t heard of the activity Which one doesn’t belong? (WODB), it involves showing students four “things” and asking them to describe/argue which one doesn’t belong. There are heaps of examples of Which one doesn’t belong? in action for math(s) on the web, Twitter, and even in a book. From what I’ve seen, for math(s) I think the activity is pretty cool. In terms of whether WODB works for stats, however, I’m not so sure. Perhaps for definitions, facts, static pieces of knowledge it could work (?), but in terms of making comparisons involving data and its various representations (including graphs/displays), I need more convincing. There’s something different between comparing properties of shapes (for example), which remain fixed, and comparing data about something/someone, which could vary.

For example, What cat doesn’t belong? for the four “stats cats” data cards shown below.

To make comparisons between the four cats means to reason with data, but if I am considering only the data provided in these four data cards then these comparisons are made without uncertainty. For example, I can say definitively, for these four cats, that:

  • Elliot is the only cat with a name that has three syllables,
  • Molly is the only female cat,
  • Joey is the only cat is both an inside and outside cat,
  • Classic is the only cat that uses a cat door.

I could argue many different cases for which cat (or photo) does not belong. This is all cool, but doesn’t feel like statistics to me. Statistics is all about using data to make decisions in the face of uncertainty, by appreciating different sources of variation and considering how to deal with these. In particular, inferential reasoning involves going beyond the data at hand, thinking about generalisability, considering the quality and quantity of data available, and appreciating/communicating the possibility of being wrong not matter how “right” the methodology.

So while I appreciate that WODB allows for “not just one correct answer” and the development of argumentation skills, I’d be more happier if this kind of activity within statistics teaching led to the posing of statistical investigative questions (SIQ): WODB->SIQ. Why? We need more data and more of an idea of where the data came from to really answer the really interesting questions that comparing these four cats might provoke us to consider. We need students to feel the uncertainty that comes from thinking and reasoning statistically and to help students find ways to deal with this uncertainty. We also need students to care about the questions being asked of the data – my worry here is that otherwise the question students might ask when using WODB is Who cares which one doesn’t belong? 🙂

Questions I have when looking at these stats cats data cards, which are interesting to me are: I wonder …. How many syllables do cats’ names have? Do most cats have two syllable names? Is Elliot (my cat!) an unusual name for this reason? Do I spend too much on cat food ($NZD30 per week)? Or maybe black cats are more expensive to feed? I won’t be able to get definitive answers to these questions, but by collecting more data and investigating these questions using statistical methods I can get a better understanding of what could be plausible answers.

PS Want some of these data cards? Head here –> It’s raining cats and dogs (hopefully)

Statistical reasoning with data cards (webinar)

UPDATE: The video of the webinar is now available here.

I’m super excited to be presenting the next ASA K-12 Statistics Education Webinar. The webinar is based on one of my sessions from last year’s Meeting Within a Meeting (MWM) and will be all about using data cards featuring NZ data/contexts. I’ll also be using the digital data cards featured in my post Initial adventures in Stickland if you’d like to see these in “teaching action”.

The webinar is scheduled for Thursday April 20 9:30am New Zealand Time (Wednesday April 19 at 5:30 pm Eastern Time, 2:30 pm Pacific), but if you can’t watch it live a video of the webinar will be made available after the live presentation 🙂

Here are all the details about the webinar:

Title: Statistical Reasoning with Data Cards

Presenter: Anna-Marie Fergusson, University of Auckland

Abstract: Using data cards in the teaching of statistics can be a powerful way to build students’ statistical reasoning. Important understandings related to working with multivariate data, posing statistical questions, recognizing sampling variation and thinking about models can be developed. The use of real-life data cards involves hands-on and visual-based activities. This talk will present material from the Meeting Within a Meeting (MWM) Statistics Workshop held at JSM Chicago (2016) which can be used in classrooms to support teaching within the Common Core State Standards for Mathematics. Key teaching and learning ideas that underpin the activities will also be discussed.

To RSVP to participate in the live webinar, please use the following link: https://goo.gl/forms/pQ5taydWwOZy2WOJ3

The ASA is offering this webinar without charge and only internet and telephone access are necessary to participate. This webinar series was developed as part of the follow-up activities to the Meeting Within a Meeting (MWM) Workshop for Math and Science teachers held in conjunction with the Joint Statistical Meetings (www.amstat.org/education/mwm). MWM will be held again in Baltimore, MD on August 1-2, 2017.  For those unavailable to participate in the live webinar, ASA will record this webinar and make it available after the live presentation. Previous webinar recordings are available at http://www.amstat.org/asa/education/K-12-Statistics-Education-Webinars.aspx.

Initial adventures in Stickland

stickland_adventure

This post provides the notes for a workshop I ran at the Otago Mathematics Association (OMA) Conference about using data challenges to encourage statistical thinking.

Until last week, I had never re-presented or adapted a workshop that I had developed in a previous year.  So it really interesting to take this workshop on data challenges, which I had presented at the AMA and CMA stats days last year, and work through it again with a new bunch of awesome teachers in Dunedin.  I wrote notes about this workshop last year –  Using data challenges to encourage statistical thinking  – so this post will just share a few things I tweaked the second time around, including an activity we tried in Stickland 🙂

Some changes and additions

To show an example of a predictive model in action, we used one of a few online tools which attempt to predict your age using your name (based on US data) e.g. rhiever.github.io/name-age-calculator/index.html. I also demonstrated another online tool that attempts to predict your gender based on writing (hackerfactor.com/GenderGuesser.php) by using my abstract for this workshop (it did correctly predict, based on the writing being formal, that it was written by a female). For the actual data challenge itself using the celebrity data, I purposefully removed Dr Dre from the training data set to make it easier to explore the data without worrying about how to handle his extremely high earnings for 2014 (new link here).

Testing Stickland

Another thing I changed about the workshop this time around was that rather than use physical data cards (these Census at school stick people data cards), we tried out my new digital data cards in the virtual world of Stickland. I’ve already shared a little bit about the ideas behind Stickland – see the Welcome to stickland! post – so what follows is an example of how we used Stickland in the workshop. (Just a quick reminder that the data cards are real students from the NZ Census At School 2015 data, the names being the only variable that is not real).

data_card

The activity starts with the idea of wanting to predict whether a stick person chosen at random from Stickland uses Facebook or not. If you head to learning.statistics-is-awesome.org/stickland, the first thing you could do is select a sample of stick people and see what proportion of them use Facebook. I got the teachers in this workshop to select 20 stick people and then let them play with moving the data cards around in the grey screen below (click or touch the card to drag the card to somewhere else on the screen e.g. to sort the cards into Facebook users and non-Facebook users).

stick-layout

For the sample shown above, an equal number of stick people are Facebook users than not, but of course this will vary from sample to sample. I then told the teachers that this particular stick person is a Snapchat user, and asked them if this changes their prediction of whether they are a Facebook user or not. One way to explore this is to create a two way table with the cards (see below) and then reason with this.

stick-sort

Most of the different samples showed a similar story to the sample above: Of the Snapchat users, most were Facebook users and of the non-Snapchat users, most were non-Facebook users. I then suggested (if we had time) we could also explore whether knowing the gender and age of the stick person would help us build a better model for predicting Facebook usage. At this stage (considering multiple variables/factors) I would want the students to move into software that allows them to explore the data more deeply (more about how that is possible is discussed in the Welcome to stickland! post). We didn’t do this in the workshop and the teachers had to leave Stickland perhaps before they wanted to 🙂

Where to next?

Stickland is just in “proof of concept” form at the moment and will no doubt have lots of bugs and weird features. In the Welcome to stickland! post, I discuss the influence of others in developing these digital data cards, in particular Pip Arnold and her work with statistical investigations and data cards that stretches back to at least 2005 (if not earlier!). Feel free to have a play and to let me know what you think about the concept, but this is definitely a possible project for 2017 and not intended to be a fully featured product yet.

Welcome to stickland!

stick_person

I’ve been working on a little side project for the last year or so. I thought this might be a good time to share this with you, particularly since I probably (with a very high probability) won’t be making any more posts for the rest of the year due a few little things called a dissertation and a wedding 🙂

The idea was to create a digital learning environment for working with data cards, in an attempt to make stronger connections between data cards, data structures and data displays, and to make effective use of tablets/devices (particularly in large lecture groups like my current teaching situation). This first digital environment is based on the C@S stick people data cards I created last year, but could involve any population/data etc, since everything is created dynamically. The idea to use stick people (figures) for the data cards was based on material Rob Gould presented at the NZAMT conference in 2015 regarding the Introduction to Data Science (IDS) course the Mobilize team created for high school students.

In stickland, the members of its population (the C@S stick people) ride by on skateboards. The numbers displayed on each stick person are their unique three digit ID number. The environment is set up so that the stick people arrive to this stretch of road in stick land in a random order and at random times. Students could check this out by watching the stick people skate on by and recording their ID numbers. They should see no pattern to the numbers and be convinced that they can not predict what ID number the next stick person will have (well, I guess if you watched for long enough you would be able to predict the last ID number……)

To select stick people to find out more about them, students click on the stick person as they skate past. Some of the stick people are faster than others (more about that next year!) so it’s not always easy to catch them. This means that it will take different times for students to collect the same number of data cards. As the stick people are selected, a stack of data cards starts to be built on the top right hand side of data card screen below.

At this point we’re in a similar position to where we would be if we had given students a set of data cards each, or if we had asked them to select a random sample of data cards from a population bag. One of the really awesome things about data cards is the physical nature of them – students can move them around, sort them, line them up, etc. So in this digital environment, students can drag the stick people data card around by tapping their heads and dragging their finger.

I love getting students to sort the data cards by a categorical variable (e.g. Facebook user) and then by another categorical variable (e.g. Snapchat user) to build ideas of two-way tables and conditioning.

stick-people-two-way

You can also get students to make graphs out of the data cards (see one of Pip Arnold’s excellent resources along these lines here on Census At School NZ). In this digital environment, students can make the cards bigger or smaller, and can move into “dot” mode as they move into graphical representations by encoding the data.

To help students build understanding of what are essential features of their graphs, there is a drawing tool so they can add in additional information like axes, labels, numbers etc.  I can see a whole lot of potential here, particularly with students exploring different ways to organise and display data.

To help build understanding of the relationship between units, variables and data structures (specifically rectangular data sets), an interactive spreadsheet builds below the data card screen as the cards are collected. When a student selects a data card, this stick person’s row of data is highlighted in the spreadsheet, and vice versa. To check each student can match the data shown on the data card to the spreadsheet, data is missing from the spreadsheet (shown by grey boxes).

Students will need to find the relevant stick person, read the card for the appropriate variable, and enter this data to make a complete data set. At the moment, I’ve set this feature so that there is missing data for 10 different stick people (one of each variable on the data card) and that the data can not be visualised using software (iNZight lite) until the missing data has been found.

The final link is to explore the data using software like iNZight lite, which has been designed by Chris Wild to help students “get into data deeper and faster” (PS I’m not sure if that is an exact quote!). The data cards are not automatically linked to the data in iNZight lite, so if more data cards are collected, the iNZight button will need to be pressed again to update. I’m excited about getting students to explore relationships and build informal predictive models (after trying this out with the data cards earlier), and then checking these models out by easily selecting more stick people (see more about this kind of activity in my post about data challenges).

So what do you think? 🙂

Statistics flowers (data cards)

flower_power

Inspired by Fisher’s Iris data, this sample of flowers was created through simulation from a carefully designed model. From a student’s perspective, these flowers represent a random sample of flowers from a much bigger population of statistics flowers. The idea is that students get all of the 300 cards and need to measure different features of the flowers and determine other variables to create their sample data.

Designed variables are: type of statistics flower (tictastics, stistactis, or castistist), petal colour (red, orange, blue, green), number of petals, petal length, petal width and stigma diameter. The diagram below shows how the measurements should be taken by students:

flower_power_labels

I have made the sample size 300 to allow for categorical and distributional exploration e.g. What proportion of all statistics flowers have a black stigma? Does stigma colour appear to be linked to petal colour for statistics flowers? How could the number of petals for statistics flowers be distributed? But I appreciate that it would take a long time for students to measure 300 different flowers and record necessary data! Perhaps students could look at the flowers visually first, sort them by type of flower and see if they can detect any features that appear to differ (e.g. colour, petal length, etc.). Students could then measure some of the flowers and chuck this data into a graph for an initial view before being given access to the digital sample to do some more exploring. Remember these data cards represent a sample and the true population parameters, for example the mean petal length of all statistics flowers, are unknown to you and the students. It is not intended that these cards are used for “population bags”.

Here is the sample data set as a CSV file: flower_power

Here are the data cards as a PDF: flower_power
You will need to print these one to a page if you want the measurements in the CSV file match!

 

Census at school stick people (data cards)

data

This population of stick people was created using data from the Census at School 2015 database. For the data cards, rather than put/indicate gender on the card I have used a fictional name, taken from the names of children entered in the 2015 Auckland kids marathon. The relevant questions from the Census at School 2015 survey are Q1, Q2, Q17, Q27 cellphone, facebook, snapchat, Q31 TV, and Q32 reading (the questions can be found here). The diagram below shows what each part of the data card represents:

data_card

For some great teaching notes for using data cards, check out Pip Arnold’s resources on Census at School, here are a couple: ID cards | Using data cards. I also used these data cards in a workshop on data challenges which you can read more about here.

Here is the population data set as a CSV file for teacher reference: CAS2015_edited

Here are the data cards as a PDF: CAS_2015_data_cards (without gender) CAS_2015_data_cards (with gender)

Here is the virtual environment to use the data cards: learning.statistics-is-awesome.org/stickland/

And here is a little more about virtual Stickland: Welcome to stickland! and Initial adventures in Stickland