predictive text


This post discusses teaching ideas for the “predictive text challenge” that I’ve set up to support maths week 2015.

Can your students predict the text readability score for any word?

So its “maths week” here in New Zealand next week, and I had this idea yesterday that it would be cool to set up an on-line statistics challenge for students to attempt during the week.  After some way more complicated ideas based on investigations I’ve done before with books (I’ll do a post about those activities later), I decided a much “simpler” challenge would involve single words.

You can check it out here:

The basic ideas are:

  • Some words are easier/harder to read than others
  • The readability score used for this challenge was made up by me (there are others out there)
  • The score classifies words as “low”, “medium” and “high” in terms of difficulty to read
  • The score I’ve devised uses various properties of each word, including its colour (black or red) and its font (normal or italics) to assign one of these scores – although students can just investigate a model for the word only without worrying about colour or font

How do students investigate the problem and form a model?

Get students to try entering words and observing the readability scores for each word. After trying out different lengths words, they should be able to get an idea that the readability score might be something to do with the length of the word. Encourage students to record data by writing down the word, and then recording the readability score alongside this. Depending on what your students are using (pen and paper vs device), they could use a spreadsheet function like len() to count the letters of the word or just count them manually.

I’ve designed the readability score so that there is a complex interaction of factors and it’s not intended that students will be able to predict the readability score with 100% accuracy for any word. So there will be variation with the number of letters in each word for words with the same readability score. If students are not seeing this they probably have not tried out enough different words. Encourage your students to use graphs like dot plots to see if the can get a basic model for the readability score based on the number of letters of the word.


Hint: The data above (20 words pulled from a random word generator website) suggest words 4 letters or lower will be scored as “low”, 5 – 9 letters “medium” and 10 letters or above ” high”. This might be a good starting point for developing a model, but with only one word with 9 letters and only one word with 10 letters, are you really convinced? Some length words will need to be specifically used (quota sampling!) due to the natural distribution of word length in the English language and you should get students to draw graphs of the just the number of letters in the words they have checked to visualise and describe the distribution.

Students  could collaborate on collecting data by using something like a Google spreadsheet as with more data they will be able to get a better sense for what else is being taken into account to get the readability score. If they are stuck for words, there are random word generators out there on the world wide web for inspiration 🙂 Encourage your students to think about what else makes words easier or harder to read. I’ve designed the readability score so students are motivated to create new measures that may be a mix of qualitative and quantitative – putting them in the role of “data detective” (see the PPDAC poster on In particular, looking at words with the same number of letters that are given different readability scores – get students to discuss what makes them different, and then get them to test out their rules.

From this, its really up to your students (and you) how much further to take the modelling process. They could investigate what happens when you change the colour (this is easier than if  you change the font, but they could do this too if they are up for the challenge). This is easier if you use software like iNZight which also has an on-line application called iNZight lite. This allows students to drag/add on third or fourth variables to their graphs to separate the data into related groups and explore relationships.

I wouldn’t make a big deal about formal notation for the model(s) – I think writing down some “rules” for how to take a word and predict its readability score is sufficient or actually even better drawing a diagram or picture (or even a tree). The main “takeaway” from the challenge hopefully is that there is more than one “thing” that the readability score is being based on. I’d be impressed if any student could “crack the code” that I’ve used to determined the readability score for each word but let me know if your students want to know how close they were 🙂

How will the challenge run?

Monday to Tuesday

Students test their own words through the on-line text analyser, record data from these tests, and try to develop a way to predict the readability score. They should make some notes/diagrams for their model and use these to predict readability scores for the first set of reserved words.

Wednesday to Thursday

Students have access to check the first set of reserved words. They enter the readability score they would predict for each reserved word, and see how this compares to the actual readability score. This should help them refine their prediction model.


Students have access to check the second set of reserved words. They enter the readability score they would predict for each reserved word, and see how this compares to the actual readability score. They should only do this once (i.e. this should be their final evaluation of the model).

Keen to give it a go?

Here’s the place to send students:

Please let me know if there are any issues in the comments below 🙂

NOTE: The challenge is over but I have left the page up in case you still want to try it out.

Want to watch a cool visualisation for this kind of modelling?

Anna teaches introductory-level statistics at the University of Auckland. She enjoys facilitating workshops to support professional development of statistics teachers and thinks teaching statistics (and mathematics) is awesome. Anna is also undertaking a PhD in statistics and data science education.

Stats challenge for maths week – predictive text 🙂
Tagged on: