Sunday 27 August 2023

AI and Language Learning

Red flag. Since I wrote this, I have discovered that Google Bard's sexism is also accompanied by colonialism and racial stereotypes. Of course this is not intentional, but it is a result of the material it has been trained on. And it has no moderation or filter. I asked it to write the story of an American child who goes to Guatemala. It immediately made stereotyped, negative, colonial, prejudiced assumptions. The conclusion of my post was that the best use of AI is to let learners chat with it. This is now off the table. Here is the start of the story it wrote, based entirely on affirming stereotypes:

And if teaching in the UK, that doesn't strike you as problematic, just imagine you are teaching Spanish in the USA, to a class of pupils many of whom are from a Hispanic background. That's the stereotype of Latin America they live with. And based on a false comparison with the States as a rich, clean, comfortable, safe, entitled, white country.

Here's the post as I wrote it. But this isn't funny anymore.


 Yesterday I saw an article advocating the use of Google Bard for language learning. Google Bard is one of the popular Artificial Intelligence bots currently causing a storm with their amazing ability to replicate human speech. I have already tried to use Bing and ChatGPT, so I thought I would test the capabilities of Bard and see if it really is any use for teaching languages.

My first question was simple. I asked it, in Spanish, to explain the rules for using question marks.


It gave me perfectly good examples, in Spanish, of questions using the Spanish ¿___? . But its explanation made no mention of upside down question marks. It just said to put a question mark at the end. So it failed my first test.

I had this test ready, as I'd already used it on ChatGPT (which also failed it). The reason is that these AI chat bots have been developed to work in English. They can then translate into other languages, but their working is in English. It's important to realise this, and the fact that it does impinge on its ability to work as a language learning tool. It's also alarming to note the fact that it doesn't detect when its examples don't match its explanation. More examples of this to come!

I wondered, if it's working in English then what will happen if I ask it for words that rhyme? If I ask it for words that rhyme with pescado, will it translate into English and give me a list of words that rhyme with fish - dish, wish - and translate them back into Spanish? It did this:

So, it is capable of working in Spanish. It's taken a definition of "rhyme" that means containing the same vowels. So lavo sort of rhymes with pescado. But I can't help noticing expedientes on that list. Which doesn't rhyme at all.

I did another test of its Anglophone bias by asking it in Spanish who the President is. I thought it could offer maybe AMLO or Sanchez as possibilities, because I had asked in Spanish. But it automatically assumed I meant Joe Biden. I've tried this cultural bias with ChatGPT with questions about Spanish food or French music. It does tend to come up with answers known to English speakers and even stereotypes.

Back to language based questions. This is one that Dr. Rachel Hawkes had alerted me to, when she was genuinely using ChatGPT to "help" create language learning resources. She asked it for a list of French nationality adjectives ending in "-ain". It was unable to do this, producing a list including the marvelous "espagnolian". So I tried it with Google Bard:



It didn't have the wonderful and undesired inventiveness of ChatGPT. But it still was no use at all. Joe Dale had a further extended conversation with Bard on this question where it acknowledged that it had done badly, and then in a series of attempts went from bad to worse.

Unlike some of my questions, this wasn't a deliberate trap or tricky test. It started from a genuine question Rachel asked to try to save some time. And it's not clear why it failed. It seems that AI isn't very good at taking language apart. So gaps in texts, parts of words, or focus on endings can all trip it up. Unfortunately these are exactly the sorts of things we want to concentrate on in language teaching.

I tried to use it to create a text where it removed the words for his and her and replaced it with son/sa/ses for the learner to choose the correct one. I explained this carefully in English, to avoid it simply doing what Microsoft Word would do and replace the letter string son even if it was in the middle of the word. Even so, it did this:






Not only has it done what I was worried about, and replaced son inside the word sont, and sa in the word responsable, but in giving the "answers", it's also tried to do it with the words garçon and attentionnée. We can't ever understand why this happened. AI works by "learning" from huge amounts of language. And somewhere in that learning it knows that çon is equivalent to son. And it thinks that tion is the same too. So it's struggling with phonics! That's cute, because so are our learners!

I didn't pick it up on the sexism of its examples. But I had previously challenged chatGPT on a very similar piece of writing it created. It got very snarky and self defensive. Of course, its biais is a reflection of the material it has been trained on.

While we are talking about using it to create resources for the new GCSE, here's what happened when I asked it to write me a story using only words taken from the 2000 most frequently used words in French.



I was surprised at the mistake with est peur. I've done similar things with ChatGPT and it shares all the flaws that we're finding here in Bard, but it doesn't tend to make mistakes in its French. But both have the annoying tendency to happily tell you they have done something, when they haven't. I am pretty sure s'enfuir and un serpent are not in the top 2000 words in French. But then again, ChatGPT's story had sword, treasure and dragon. If only it just said, "Sorry, I don't know what the 2000 most common words in French are" then it would be fine. Bing AI also has this tendency to oblige and make things up when it doesn't know. Which is inexcusable because Bing does have access to the internet and acts as a search engine.

So far, Bard immediately failed all my requests based on language. Whether they were tests I designed to see if I could catch it out, or genuine requests of the sort you might make for creating language teaching resources. You can see more on this twitter thread. It also failed other straight-forward requests like a list of masculine countries in French.

So instead of trying to see if it would work for creating resources for a teacher, I decided to see how it would work for a learner asking for explanations of language points. This is one of the uses directly mentioned by articles advocating the use of Google Bard for language learners.

Where shall we start? I may as well say at this point, that it goes on to fail at explaining every single grammar point I asked it.

Here it is saying that c, z and s are all pronounced as th in Spain:


Now it tells us that mieux and meilleur  are pronounced the same. (It made the same claim for bien and bon.)



And now it explains why you need a grave accent on the word for at in French:



Again, you can see more hilarious examples on the twitter thread. It got in a muddle with tout/toute and with black and white cows being zebras. Oh, go on, I'll give you that one here:



I was surprised at that one, because the difference between some cows which are all black and white (des vaches noir et blanc) and a mixed bunch of black cows and white cows (des vaches noires et blanches) is the go-to example used in explaining the rules for adjectival agreement. I think that ChatGPT may be slightly better here. But both ChatGPT and Bard suffer from the general problem of the examples they give just not matching what they are explaining.

So AI isn't good at looking at parts of words, and it isn't good at explaining grammar with coherent examples. What else do people suggest using it for? It's suggested that it is good at giving feedback to learners. I know already that ChatGPT is terrible at this. Just as with grammar explanations, the examples it picks out from pupils' work, just don't match the point it was trying to correct. Let's see how Bard does. Are you feeling hopeful? Sorry:



It has ignored the actual mistake in using the article with a job, and the question of gender. And instead it's made up something called "noun-verb" agreement, saying the verb est should be feminine.

Alarm bells should be ringing. We should not be using AI for grammatical explanation. AI has no knowledge or intelligence. It is simply a very very impressive predictive text tool. It has been trained on patterns of human speech. It can parrot and sound as if it knows what it is talking about. But it has no knowledge. Any information it gives is a lucky coincidence, the result of putting words together in a way it has spotted humans do. And it seems that the humanity it reflects is anglophone and sexist. The fact that it comes even close to giving vaguely reliable sounding information, is a comment on how predictable we all are!

What should AI be good at? Well, language, I suppose. But have a look at the twitter thread where I posted all the questions I asked it. It failed every single one. It's inexplicably bad at language, or maybe languages. This article from The Times may be behind a paywall, but its conclusion is that "Letting AI teach is like letting Casualty actors run A&E." AI is just mimicking. Nothing else.

And here's what was my final paragraph. Which now has to be withdrawn. Because interacting with google bard is not safe for our learners.

Except there is one way you could use AI for language learning where the potential is huge: chat to it. The clue is in the name ChatGPT. In your prompt, explain that you want it to be a conversational partner for learning the language. Tell it your level. Ask it to answer but also to ask questions. And have a conversation with it.

When you start to use google bard, it does warn you that it is experimental and may be inappropriate. But all the articles suggesting you use it for language learning don't say that. And they should.

No comments:

Post a Comment