You say data, I say data cards …

This long weekend (in Auckland anyway!), I spent some time updating the Quick! Draw! sampling tool (read more about it here Cat and whisker plots: sampling from the Quick, Draw! dataset). You may need to clear your browser cache/data to see the most recent version of the sampling tool.

One of the motivations for doing so was a visit to my favourite kind of store – a stationery store – where I saw (and bought!) this lovely gadget:

It’s a circle punch with a 2″/5 cm diameter. When I saw it, my first thought was “oh cool I can make dot-shaped data cards”, like a normal person right?

Using data cards to make physical plots is not a new idea – see censusatschool.org.nz/resource/growing-scatterplots/ by Pip Arnold for one example:

But I haven’t seen dot-shaped ones yet, so this led me to re-develop the Quick! Draw! sampling tool to be able to create some 🙂

I was also motivated to work some more on the tool after the fantastic Wendy Gibbs asked me at the NZAMT (New Zealand Association of Mathematics Teachers) writing camp if I could include variables related to the times involved with each drawing. I suspect she has read this super cool post by Jim Vallandingham (while you’re at his site, check out some of his other cool posts and visualisations) which came out after I first released the sampling tool and compares strokes and drawing/pause times for different words/concepts – including cats and dogs!

So, with Quick! Draw! sampling tool you can now get the following variables for each drawing in the sample:

The drawing and pause times are in seconds. The drawing time captures the time taken for each stroke from beginning to end and the pause time captures all the time between strokes. If you add these two times together, you will get the total time the person spent drawing the word/concept before either the 20 seconds was up, or Google tried to identify the word/concept. Below the word/concept drawn is whether the drawing was correctly recognised (true) or not (false).

I also added three ways to use the data cards once they have been generated using the sampling tool (scroll down to below the data cards). You can now:

  1. download a PDF version of the data cards, with circles the same size as the circle punch shown above (2″/5cm)
  2. download the CSV file for the sample data
  3. show the sample data as a HTML table (which makes it easy to copy and paste into a Google sheet for example)

In terms of options (2) and (3) above, I had resisted making the data this accessible in the previous version of the sampling tool. One of the reasons for this is because I wanted the drawings themselves to be considered as data, and as human would be involved in developed this variable, there was a need to work with just a sample of all the millions of drawings. I still feel this way, so I encourage you to get students to develop at least one new variable for their sample data that is based on a feature of the drawing 🙂 For example, whether the drawing of a cat is the face only, or includes the body too.

There are other cool things possible to expand the variables provided. Students could create a new variable by adding drawing_time and pause_time together. They could also create a variable which compares the number_strokes to the drawing_time e.g. average time per stroke. Students could also use the day_sketched variable to classify sketches as weekday or weekend drawings. Students should soon find the hemisphere is not that useful for comparisons, so could explore another country-related classification like continent. More advanced manipulations could involve working with the time stamps, which are given for all drawings using UTC time. This has consequences for the variable day_sketched as many countries (and places within countries) will be behind or ahead of the UTC time.

If you’ve made it this far in the post…. why not play with a little R 🙂

I wonder which common household pet Quick! drawers tend to use the most strokes to draw? Cats, dogs, or fish?

Have a go at modifying the R code below, using the iNZightPlots package by Tom Elliott and my [very-much-in-its-initial-stages-of-development] iNZightR package, to see what we can learn from the data 🙂 If you’re feeling extra adventurous, why not try modifying the code to explore the relationship between number of strokes and drawing time!

It’s raining cats and dogs (hopefully)

In April 2017, I presented an ASA K-12 statistics education webinar: Statistical reasoning with data cards (webinar). Towards the end of the webinar, I encouraged teachers to get students to make their own data cards about their cats. A few days later, I then thought that this could be something to get NZ teachers and students involved with. Imagine a huge collection of real data cards about dogs and cats? Real data that comes from NZ teachers and students? Like Census At School but for pets 🙂 I persuaded a few of my teacher friends to create data cards for their pets (dogs or cats) and to get their students involved, to see whether this project could work. Below is a small selection of the data cards that were initially created (beware of potential cuteness overload!)

The project then expanded to include more teachers and students across NZ, and even the US, and I’ve now decided to keep the data card generator (and collection) page open so that the set of data cards can grow over time. Please use the steps below to get students creating and sharing data cards about their pets.

Creating and sharing data cards about dogs and cats

Inevitably, there will be submissions made that are “fake”, silly or offensive (see below).

Data cards submitted to the project won’t automatically be added to any public sets of data cards, and will be checked first. Just like with any surveying process that is based on self-selection, is internet based and relies on humans to give honest and accurate answers, there is the potential for non-sampling errors. To help reduce the quantify of “fake” data cards, if you are keen to have your students involved with this project it would be great if you could do the following:

1. Talk to your students about the project and explain that the data cards will be shared with other students. They will be sharing information about their pet and need to be OK with this (and don’t have to!). The data will be displayed with a picture of their pet, so participation is not strictly anonymous. All of this is important to discuss with students as we need to educate students about data privacy 🙂

2. When students submit their data, they are given the finished data card which they can save. Set up a system where students need to share the data card they have created with you e.g. by saving into a shared Google drive or Dropbox, or by emailing the data card to you. The advantage for you of setting up this system is that you get your class/school set of data cards to use however you want. The advantage for me is that this level of “watching” might discourage silly data cards being created.

3. Share this link with your students http://learning.statistics-is-awesome.org/dogsvscats/ and let the rain of cats and dogs begin!

Pet data cards

The data collection period for this set of data cards was 1 May 17 to 19 May 17.

The diagram below shows the data included on each data card:


Additional data that could be used from each data card includes:

  • Whether the pet photo was taken inside or outside
  • Whether the pet photo is rotated (and the angle of rotation)
  • The number of letters in the pet name
  • The number of syllables in the pet name

PDF of all data cards: click to download

 

Statistics flowers (data cards)

flower_power

Inspired by Fisher’s Iris data, this sample of flowers was created through simulation from a carefully designed model. From a student’s perspective, these flowers represent a random sample of flowers from a much bigger population of statistics flowers. The idea is that students get all of the 300 cards and need to measure different features of the flowers and determine other variables to create their sample data.

Designed variables are: type of statistics flower (tictastics, stistactis, or castistist), petal colour (red, orange, blue, green), number of petals, petal length, petal width and stigma diameter. The diagram below shows how the measurements should be taken by students:

flower_power_labels

I have made the sample size 300 to allow for categorical and distributional exploration e.g. What proportion of all statistics flowers have a black stigma? Does stigma colour appear to be linked to petal colour for statistics flowers? How could the number of petals for statistics flowers be distributed? But I appreciate that it would take a long time for students to measure 300 different flowers and record necessary data! Perhaps students could look at the flowers visually first, sort them by type of flower and see if they can detect any features that appear to differ (e.g. colour, petal length, etc.). Students could then measure some of the flowers and chuck this data into a graph for an initial view before being given access to the digital sample to do some more exploring. Remember these data cards represent a sample and the true population parameters, for example the mean petal length of all statistics flowers, are unknown to you and the students. It is not intended that these cards are used for “population bags”.

Here is the sample data set as a CSV file: flower_power

Here are the data cards as a PDF: flower_power
You will need to print these one to a page if you want the measurements in the CSV file match!

 

Census at school stick people (data cards)

data

This population of stick people was created using data from the Census at School 2015 database. For the data cards, rather than put/indicate gender on the card I have used a fictional name, taken from the names of children entered in the 2015 Auckland kids marathon. The relevant questions from the Census at School 2015 survey are Q1, Q2, Q17, Q27 cellphone, facebook, snapchat, Q31 TV, and Q32 reading (the questions can be found here). The diagram below shows what each part of the data card represents:

data_card

For some great teaching notes for using data cards, check out Pip Arnold’s resources on Census at School, here are a couple: ID cards | Using data cards. I also used these data cards in a workshop on data challenges which you can read more about here.

Here is the population data set as a CSV file for teacher reference: CAS2015_edited

Here are the data cards as a PDF: CAS_2015_data_cards (without gender) CAS_2015_data_cards (with gender)

Here is the virtual environment to use the data cards: learning.statistics-is-awesome.org/stickland/

And here is a little more about virtual Stickland: Welcome to stickland! and Initial adventures in Stickland