Sunday, 27 August 2023

AI and Language Learning

Red flag. Since I wrote this, I have discovered that Google Bard's sexism is also accompanied by colonialism and racial stereotypes. Of course this is not intentional, but it is a result of the material it has been trained on. And it has no moderation or filter. I asked it to write the story of an American child who goes to Guatemala. It immediately made stereotyped, negative, colonial, prejudiced assumptions. The conclusion of my post was that the best use of AI is to let learners chat with it. This is now off the table. Here is the start of the story it wrote, based entirely on affirming stereotypes:

And if teaching in the UK, that doesn't strike you as problematic, just imagine you are teaching Spanish in the USA, to a class of pupils many of whom are from a Hispanic background. That's the stereotype of Latin America they live with. And based on a false comparison with the States as a rich, clean, comfortable, safe, entitled, white country.

Here's the post as I wrote it. But this isn't funny anymore.


 Yesterday I saw an article advocating the use of Google Bard for language learning. Google Bard is one of the popular Artificial Intelligence bots currently causing a storm with their amazing ability to replicate human speech. I have already tried to use Bing and ChatGPT, so I thought I would test the capabilities of Bard and see if it really is any use for teaching languages.

My first question was simple. I asked it, in Spanish, to explain the rules for using question marks.


It gave me perfectly good examples, in Spanish, of questions using the Spanish ¿___? . But its explanation made no mention of upside down question marks. It just said to put a question mark at the end. So it failed my first test.

I had this test ready, as I'd already used it on ChatGPT (which also failed it). The reason is that these AI chat bots seem to have been developed to work in English. They can then translate into other languages, but their working is in English. It's important to realise this, and the fact that it does impinge on its ability to work as a language learning tool. It's also alarming to note the fact that it doesn't detect when its examples don't match its explanation. More examples of this to come!

I wondered, if it's working in English then what will happen if I ask it for words that rhyme? If I ask it for words that rhyme with pescado, will it translate into English and give me a list of words that rhyme with fish - dish, wish - and translate them back into Spanish? It did this:

So, it is capable of working in Spanish. It's taken a definition of "rhyme" that means containing the same vowels. So lavo sort of rhymes with pescado. But I can't help noticing expedientes on that list. Which doesn't rhyme at all.

I did another test of its Anglophone bias by asking it in Spanish who the President is. I thought it could offer maybe AMLO or Sanchez as possibilities, because I had asked in Spanish. But it automatically assumed I meant Joe Biden. I've tried this cultural bias with ChatGPT with questions about Spanish food or French music. It does tend to come up with answers known to English speakers and even stereotypes.

Back to language based questions. This is one that Dr. Rachel Hawkes had alerted me to, when she was genuinely using ChatGPT to "help" create language learning resources. She asked it for a list of French nationality adjectives ending in "-ain". It was unable to do this, producing a list including the marvelous "espagnolian". So I tried it with Google Bard:



It didn't have the wonderful and undesired inventiveness of ChatGPT. But it still was no use at all. Joe Dale had a further extended conversation with Bard on this question where it acknowledged that it had done badly, and then in a series of attempts went from bad to worse.

Unlike some of my questions, this wasn't a deliberate trap or tricky test. It started from a genuine question Rachel asked to try to save some time. And it's not clear why it failed. It seems that AI isn't very good at taking language apart. So gaps in texts, parts of words, or focus on endings can all trip it up. Unfortunately these are exactly the sorts of things we want to concentrate on in language teaching.

I tried to use it to create a text where it removed the words for his and her and replaced it with son/sa/ses for the learner to choose the correct one. I explained this carefully in English, to avoid it simply doing what Microsoft Word would do and replace the letter string son even if it was in the middle of the word. Even so, it did this:






Not only has it done what I was worried about, and replaced son inside the word sont, and sa in the word responsable, but in giving the "answers", it's also tried to do it with the words garçon and attentionnée. We can't ever understand why this happened. AI works by "learning" from huge amounts of language. And somewhere in that learning it knows that çon is equivalent to son. And it thinks that tion is the same too. So it's struggling with phonics! That's cute, because so are our learners!

I didn't pick it up on the sexism of its examples. But I had previously challenged chatGPT on a very similar piece of writing it created. It got very snarky and self defensive. Of course, its biais is a reflection of the material it has been trained on.

While we are talking about using it to create resources for the new GCSE, here's what happened when I asked it to write me a story using only words taken from the 2000 most frequently used words in French.



I was surprised at the mistake with est peur. I've done similar things with ChatGPT and it shares all the flaws that we're finding here in Bard, but it doesn't tend to make mistakes in its French. But both have the annoying tendency to happily tell you they have done something, when they haven't. I am pretty sure s'enfuir and un serpent are not in the top 2000 words in French. But then again, ChatGPT's story had sword, treasure and dragon. If only it just said, "Sorry, I don't know what the 2000 most common words in French are" then it would be fine. Bing AI also has this tendency to oblige and make things up when it doesn't know. Which is inexcusable because Bing does have access to the internet and acts as a search engine.

So far, Bard immediately failed all my requests based on language. Whether they were tests I designed to see if I could catch it out, or genuine requests of the sort you might make for creating language teaching resources. You can see more on this twitter thread. It also failed other straight-forward requests like a list of masculine countries in French.

So instead of trying to see if it would work for creating resources for a teacher, I decided to see how it would work for a learner asking for explanations of language points. This is one of the uses directly mentioned by articles advocating the use of Google Bard for language learners.

Where shall we start? I may as well say at this point, that it goes on to fail at explaining every single grammar point I asked it.

Here it is saying that c, z and s are all pronounced as th in Spain:


Now it tells us that mieux and meilleur  are pronounced the same. (It made the same claim for bien and bon.)



And now it explains why you need a grave accent on the word for at in French:



Again, you can see more hilarious examples on the twitter thread. It got in a muddle with tout/toute and with black and white cows being zebras. Oh, go on, I'll give you that one here:



I was surprised at that one, because the difference between some cows which are all black and white (des vaches noir et blanc) and a mixed bunch of black cows and white cows (des vaches noires et blanches) is the go-to example used in explaining the rules for adjectival agreement. I think that ChatGPT may be slightly better here. But both ChatGPT and Bard suffer from the general problem of the examples they give just not matching what they are explaining.

So AI isn't good at looking at parts of words, and it isn't good at explaining grammar with coherent examples. What else do people suggest using it for? It's suggested that it is good at giving feedback to learners. I know already that ChatGPT is terrible at this. Just as with grammar explanations, the examples it picks out from pupils' work, just don't match the point it was trying to correct. Let's see how Bard does. Are you feeling hopeful? Sorry:



It has ignored the actual mistake in using the article with a job, and the question of gender. And instead it's made up something called "noun-verb" agreement, saying the verb est should be feminine.

Alarm bells should be ringing. We should not be using AI for grammatical explanation. AI has no knowledge or intelligence. It is simply a very very impressive predictive text tool. It has been trained on patterns of human speech. It can parrot and sound as if it knows what it is talking about. But it has no knowledge. Any information it gives is a lucky coincidence, the result of putting words together in a way it has spotted humans do. And it seems that the humanity it reflects is anglophone and sexist. The fact that it comes even close to giving vaguely reliable sounding information, is a comment on how predictable we all are!

What should AI be good at? Well, language, I suppose. But have a look at the twitter thread where I posted all the questions I asked it. It failed every single one. It's inexplicably bad at language, or maybe languages. This article from The Times may be behind a paywall, but its conclusion is that "Letting AI teach is like letting Casualty actors run A&E." AI is just mimicking. Nothing else.

And here's what was my final paragraph. Which now has to be withdrawn. Because interacting with google bard is not safe for our learners.

Except there is one way you could use AI for language learning where the potential is huge: chat to it. The clue is in the name ChatGPT. In your prompt, explain that you want it to be a conversational partner for learning the language. Tell it your level. Ask it to answer but also to ask questions. And have a conversation with it.

When you start to use google bard, it does warn you that it is experimental and may be inappropriate. But all the articles suggesting you use it for language learning don't say that. And they should.

Tuesday, 22 August 2023

Coming up on the horizon already: The new GCSE

 How much of a roadblock is the new GCSE going to be? There are several things to look forward to over the next few years. The NCLE hubs working to share good practice, the possibility of a new government, and hints of a National Language Strategy being formulated. Will this be a period of renewal and excitement? Or will it all be insignificant pretty little daisies growing around  the edges of a hulking great boulder: the new GCSE.

We know what is in the new GCSE: grammar, vocabulary, role plays, pictures, translation, dictation, reading aloud. Will it be similar enough for us to easily move to teaching the new exam? Or does it require fundamental change?

The exam boards have indicated that the areas of content will be similar to the current topics. Many of the tasks and question types will look familiar. It would be reassuring to think that minimal adjustment is needed in planning and teaching. And yet, this exam was supposed to be a lever in bringing about change in how we teach. So is it dangerous to assume we can carry on as we were?

At the East of England Association for Language Learning meeting in June 2023, Rachel Hawkes warned us to be careful. Of the current GCSE, only 50% of the vocabulary list is on the new GCSE list. So there is a lot we could cut out. Perhaps more importantly, 50% of the vocabulary on the new list, wasn't on the old list. So we do have to teach words that we haven't been teaching before.

The vocabulary list is central to the new GCSE. The idea is that with limited time for learning, the content to be learned should be clearly defined. And the GCSE panel specified that the most logical vocabulary to learn is the words which are used most frequently. This way, from KS3 (or even KS2), we can cut out words which are not going to appear in the GCSE. All those lists of pets, foods, sports, places in town, pencil case items, family members. We don't need to teach so many nouns.

And the words we do choose to teach can be revisited regularly. When they are introduced and how often we come across them again (and again) can be preplanned. Texts can be built out of the words and out of the carefully sequenced grammar. And this is what NCELP did. Their materials are a marvel of carefully sequenced and revisited language. A far cry from so many text books with long lists of words met only only once and grammar points covered ticked off on a grid.

This is the task facing exam boards and publishers. To do it properly, they have to take this approach: start with the defined content (grammar and vocabulary), sequence it, and then build texts out of it. Starting from the vocabulary list, planning the occasions when the words are to be met, and then writing texts using those words.

Sounds easy. But it is immensely difficult. The exam boards have already come a-cropper with words in the Sample Assessment Materials creeping in which are not on the list. I think, like a vegetarian exchange student staying with a French family, that one of the offending items was chicken!

This summer I have turned down writing work from companies wanting to tweak and update resources for the new GCSE. Because doing it properly will not be tweaking. Doing it properly means starting from the vocab list and planning what to create. Like cooking based on what is in the fridge, not on what you and your guests would like to eat. 

It means you have to hold at arm's length any actual texts of interesting or true information. Because the words you need will not be the words you have at your disposal. So you have to start to create an alternative reality built out of the words you do have. Reminds me of this sketch rewriting the the Sesame Street song, being forced in frustration to change one word at a time until you end up with "Stormy Nights... can you tell me how to get to Yellowstone Park." Because ultimately content, culture and meaning are secondary to meeting and practising the language.

If this is how professional published material will work - starting from the word list - I think teachers will work differently. When we write or re-use texts, we will write the text first. Then we might use the multilingual profiler to check which words we have used are not on the list. And then we can give a gloss of those words in English so pupils don't have to worry about them and we don't have to throw away our text.

So we won't be doing it "properly" like the publishers will have to. We will be doing it pragmatically. Taking texts and checking them, tweaking them where we can.

Will we be cutting back on vocabulary taught in KS3? It sounds like a great idea. But what words will be cut? We've already mentioned chicken. If not many foods or animals are on the list, then what do we do? Teach the core ones and let individual pupils know the ones they personally want to ask for? Because at GCSE, they can use "chicken" in the speaking and writing exam. But it won't be in the Listening or Reading exam, and the Speaking and Writing tasks will be devised so as not to require any chicken.

This model of teaching the core high frequency vocabulary and handing out individual preference words to individual pupils sounds like a lot of work. Teaching individual pupils individual words. But it also puts a stop to communicative tasks in the classroom. If the only animals all pupils know are dog and fish -and you teach cat, bird, snake to individual pupils but not to all - then how do they do a survey about what pets they have, when they don't know what the other pupil is saying?

Here's a question. What are the Restaurant Role Plays going to look like at GCSE if words like chicken aren't on the list?

So again, I don't think I am going to be doing it "properly". I will not be removing words from KS3 which are not on the GCSE list. I don't think I'll be removing them from GCSE either. When it comes to revision, homework and exam preparation, I can be much clearer in telling the pupils what words they have to know. But I don't think that is the same as long term language learning.

At KS3, I will be sticking with our approach of simultaneously building pupils' accumulation of language and developing what they can do with it. So that, like snow, rather than having an even coating (prone to melting away), they have a snowball that more and more language can stick to.

And I think that if that's what works at KS3, then it's what works at KS4 too.

It would be lovely to think that rather than a massive boulder we crash into or have to find a way around, the new GCSE is a bit of a speed bump that slows us down and makes us pay attention, but doesn't actually make us change our plans completely.