A simple app that only does three things

Here’s a scenario. You buy a jumbo bag of marshmallows that contains a mix of pink and white colours. Of the 120 in the bag, 51 are pink, which makes you unhappy because you prefer the taste of pink marshmallows.

Time to write a letter of complaint to the company manufacturing the marshmallows?

The thing we work so hard to get our statistics students to believe is that there’s this crazy little thing called chance, and it’s something we’d like them to consider for situations where random sampling (or something like that) is involved.

For example, let’s assume the manufacturing process overall puts equal proportions of pink and white marshmallows in each jumbo bag. This is not a perfect process, there will be variation, so we wouldn’t expect exactly half pink and half white for any one jumbo bag. But how much variation could we expect? We could get students to flip coins, with each flip representing a marshmallow, and heads representing white and tails representing pink. We then can collate the results for 120 marshmallows/flips – maybe the first time we get 55 pink – and discuss the need to do this process again to build up a collection of results. Often we move to a computer-based tool to get more results, faster. Then we compare what we observed – 51 pink – to what we have simulated.


Created using my learning.statistics-is-awesome.org/modelling-tool, yes it should be two-tailed, no my tool doesn’t allow this ūüôĀ

I use these kind of activities with my students, but I wanted something more so I made a very simple app earlier this year. You can find it here: learning.statistics-is-awesome.org/threethings/. You can only do three things with it (in terms of user interactions) but in terms of learning, you can do way more than three things. Have a play!

In particular, you can show that models other than 50% (for the proportion of pink marshmallows) can also generate data (simulated proportions) consistent with the observed proportion. So, not being able to reject the model used for the test (50% pink) doesn’t mean the 50% model is the one true thing. There are others. Like I told my class – just because my husband and I are compatible (and I didn’t reject him), doesn’t mean I couldn’t find another husband similarly compatible.

Note: The app is in terms of percentages, because that aligns to our approach with NZ high school students when using and interpreting survey/poll results. However, I first use counts for any introductory activities before moving to percentages, as demonstrated with this marshmallow example. The app rounds percentages to the closest 1% to keep the focus on key concepts rather than focusing on (misleading) notions of precision. I didn’t design it to be a tool for conducting formal tests or constructing confidence intervals, more to support the reasoning that goes with those approaches.

Magical powers? Nah just statistics!

gender
This post is about a prediction model I’m working on that aims to predict the gender of a teacher based on a few questions.

On World Statistics Day, I posted about a survey for high school teachers. Thanks to all the teachers who completed the survey! There is not quite enough data to share yet but I have used the data to make a start on a prediction model.

The plan is that, after asking you some questions, I can use your answers to predict your gender. Share with your teacher friends and let them be impressed with statistics too!

Once I get more data, I will “share the secrets” including the data so you can use this context for some exploratory data analysis of your own with your students.

Ready to be impressed (hopefully)?

Head here to try out the teacher gender predictor!

Predicting gender from dating profiles

dating

This post features a quiz asking you to predict the gender of someone based on what they write about themselves on a dating website.

How does the quiz work?

You will be given 20 different “About me” descriptions taken from public profiles displayed on a dating website. For each description, you will need to select whether you think this description is from a male or female profile. Note:¬†Your choices will be recorded and this data may be used for a future post. No personal information about you will be recorded.

Get started!

For this quiz, you will be given 20 different “About me” descriptions taken from public profiles displayed on a dating website. For each description, you will need to select whether you think this description is from a male or female profile.

How long does it take a student to submit a swear word into a text analysis tool?

mid-week stats teaching inspirationpredictive textUpdate on the predictive text challenge

I haven’t heard anything from anyone with any problems, and there seems to be a bit of traffic to the challenge page, so hopefully this is going well. I’ll allow checking of the first list of reserved words tomorrow. Students should put in what they predict the readability score will be for each word. These predicted scores will be checked against the actual readability scores and students will be given an overall result e.g. 85%.¬†Oh, and just because you’re a teacher too you’ll get this idea for an investigative question/problem……. How long does it take a student to submit a swear word into a text analysis tool?

books

Related “reading themed” statistical investigation ideas

Check out¬†http://josephrocca.com/randomsentence/¬†where you can generate “random” sentences from books that are no longer under U.S.A. copyright restrictions – so books generally published before the early 20th century. You could compare the process for random sampling sentences from digital books to processes for random sampling sentences from physical books (so much here with different sampling methods). You could give students an actual physical book and challenge them to estimate the total word count (check using the digital version!), or get students to devise a way to compare the “readability” of two books, or….?

dominoesSo what was so surprising?

Recap: I got 10 dominoes from a supermarket recently and was surprised to find that all 10 were different (there are 50 different dominoes to collect). Ok, so on the face of it this may look like a familiar (and not super awesome) starter. Collecting cereal cards, ice block sticks, seed packets…….. But I was surprised to see this because I was thinking that a random process like this would mean I should expect to see at least one double up e.g. like seeing runs of heads when you flip a coin. When I thought about it more, I realised I wasn’t taking into account there were 50 dominoes – this makes a difference.

SOLO

More about SOLO

SOLO stands for the Structure of the Observed Learning Outcomes. It’s a model/taxonomy for defining different levels of understanding or thinking and was developed by J. Biggs and K. Collis in 1982. I’ve been using SOLO in my teaching of statistics since around 2006 and think it’s awesome. It fits so well with building conceptual understandings of statistics rather than just procedural ones. I use SOLO in (at least) two ways: (1) to structure good questions for students to use when working with data, questions to make them think at different levels and (2) to plan my teaching of a topic e.g. what are the key ideas (not skills)?

cookieThe prices increased from Jan to Feb and then decreased from Mar to May and then increased again…..

I think I like this answer on Quora re how to explain over-fitting of models.¬†Some of the language is a bit off – I think if you swap the word “hypothesis” for “model” and remove “experiments” and replace with “observations” it reads better. But I like the idea of how to explain to students that a model is not about getting a perfect fit to the observed data and that simpler can be better (e.g. go for the minimum number of trend lines as possible that tell the general story of what is happening……).