Magical powers? Nah just statistics!

gender
This post is about a prediction model I’m working on that aims to predict the gender of a teacher based on a few questions.

On World Statistics Day, I posted about a survey for high school teachers. Thanks to all the teachers who completed the survey! There is not quite enough data to share yet but I have used the data to make a start on a prediction model.

The plan is that, after asking you some questions, I can use your answers to predict your gender. Share with your teacher friends and let them be impressed with statistics too!

Once I get more data, I will “share the secrets” including the data so you can use this context for some exploratory data analysis of your own with your students.

Ready to be impressed (hopefully)?

Head here to try out the teacher gender predictor!

Predicting gender from dating profiles

dating

This post features a quiz asking you to predict the gender of someone based on what they write about themselves on a dating website.

How does the quiz work?

You will be given 20 different “About me” descriptions taken from public profiles displayed on a dating website. For each description, you will need to select whether you think this description is from a male or female profile. Note: Your choices will be recorded and this data may be used for a future post. No personal information about you will be recorded.

Get started!

For this quiz, you will be given 20 different “About me” descriptions taken from public profiles displayed on a dating website. For each description, you will need to select whether you think this description is from a male or female profile.

How long does it take a student to submit a swear word into a text analysis tool?

mid-week stats teaching inspirationpredictive textUpdate on the predictive text challenge

I haven’t heard anything from anyone with any problems, and there seems to be a bit of traffic to the challenge page, so hopefully this is going well. I’ll allow checking of the first list of reserved words tomorrow. Students should put in what they predict the readability score will be for each word. These predicted scores will be checked against the actual readability scores and students will be given an overall result e.g. 85%. Oh, and just because you’re a teacher too you’ll get this idea for an investigative question/problem……. How long does it take a student to submit a swear word into a text analysis tool?

books

Related “reading themed” statistical investigation ideas

Check out http://josephrocca.com/randomsentence/ where you can generate “random” sentences from books that are no longer under U.S.A. copyright restrictions – so books generally published before the early 20th century. You could compare the process for random sampling sentences from digital books to processes for random sampling sentences from physical books (so much here with different sampling methods). You could give students an actual physical book and challenge them to estimate the total word count (check using the digital version!), or get students to devise a way to compare the “readability” of two books, or….?

dominoesSo what was so surprising?

Recap: I got 10 dominoes from a supermarket recently and was surprised to find that all 10 were different (there are 50 different dominoes to collect). Ok, so on the face of it this may look like a familiar (and not super awesome) starter. Collecting cereal cards, ice block sticks, seed packets…….. But I was surprised to see this because I was thinking that a random process like this would mean I should expect to see at least one double up e.g. like seeing runs of heads when you flip a coin. When I thought about it more, I realised I wasn’t taking into account there were 50 dominoes – this makes a difference.

SOLO

More about SOLO

SOLO stands for the Structure of the Observed Learning Outcomes. It’s a model/taxonomy for defining different levels of understanding or thinking and was developed by J. Biggs and K. Collis in 1982. I’ve been using SOLO in my teaching of statistics since around 2006 and think it’s awesome. It fits so well with building conceptual understandings of statistics rather than just procedural ones. I use SOLO in (at least) two ways: (1) to structure good questions for students to use when working with data, questions to make them think at different levels and (2) to plan my teaching of a topic e.g. what are the key ideas (not skills)?

cookieThe prices increased from Jan to Feb and then decreased from Mar to May and then increased again…..

I think I like this answer on Quora re how to explain over-fitting of models. Some of the language is a bit off – I think if you swap the word “hypothesis” for “model” and remove “experiments” and replace with “observations” it reads better. But I like the idea of how to explain to students that a model is not about getting a perfect fit to the observed data and that simpler can be better (e.g. go for the minimum number of trend lines as possible that tell the general story of what is happening……).