Tag Archives: language

Google’s new NMT speaks its own language

This post was contributed by Alan Mosca, a PhD student in Birkbeck’s Department of Computer Science and Information Systems. Alan tweets at @nitbix

A Google research group has announced a breakthrough that could have a deep impact on the field of automated translation of documents and web pages.

In the recently released article “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation” they show how their Neural Machine Translation (NMT) system is able to perform translation between pairs of languages, for which the system has never seen any examples.

In practice, this means that Google’s system is able to automatically translate between two languages, without adopting the “trick” of interlingual translation. (Interlingual translation is a technique commonly adopted in machine translation, of using a common intermediate language to bridge two languages for which there is no corpora available. In this example, the translation would be French -> English -> German, and vice versa, using English as the bridging language). This occurs through a common deep learning method called Long-Short Term Memory (LSTM), through which a machine can learn how to translate between, say, English and French and English and German by processing examples of translations.

The exciting development is that all of this is achieved in a single model, which is able to operate on multiple language pairs. It even appears to have had the effect of the model developing its own “internal representation” of concepts, which is completely independent of the specific languages it learns to translate. The examples in the paper are not limited to European languages, either – the system is able to translate between Japanese and Korean without seeing a simple example that joins the two languages. An example of how this works is shown in Fig. 1.

Fig.1: Example zero-shot translation after training on an intermediate language

All of this, of course, is done inside a deep learning model: an LSTM. The multi-lingual translation is achievable in the single model by adding a token for the destination language in the input. For example, if one wanted to translate “Hello, my name is Bob” to Spanish, the input would be “<2es> Hello, my name is Bob”.

A further exciting observation made by researchers from Google Brain is that the system does not need to be told what language the input is in, disambiguating the difficult cases on its own. Take the word “burro” for instance: it means “butter” in Italian but “donkey” in Spanish. Even for words that have the same spelling but different meanings in different languages, the system is usually able to discriminate based on context.

The model learns an “encoder” LSTM and a “decoder” LSTM; it has a similar appearance to multi-layer auto-encoders. The centre contains an attention model, and the layer just before the attention is the one that outputs the “common encoding”: a semantic representation of the input that is language-independent.

Being Google, as well as testing on the benchmark datasets in machine translation, they used their own internal dataset, which is probably very large and certainly very private. The code is very private too, but the researchers have given us an insight into the kind of infrastructure they needed: 100 (presumably state-of-the-art) GPUs, trained for over 3 weeks. The results are impressive, beating state-of-the-art ad-hoc models in a few cases. For a single model developed for multiple languages, Google’s NMT system provides a great advantage, and we should expect ever better translations from Google Translate as a consequence.

From ‘Go back to China’ to ‘Where are you really from?’: Nationality and ethnicity talk in everyday interactions

3 Replies

This article was contributed by Professor Zhu Hua of Birkbeck’s Department of Applied Linguistics and Communication

In his open letter published in the New York Times on 9 October, Michael Luo, who was born and grew up in the US, told of his encounter with a woman who yelled at him and his family, ‘Go back to China!’, on the Upper East Side of Manhattan when they came out of a church service. Puzzled by the event, his 7-year-old daughter asked ‘Why did she say, ‘Go back to China?’ We’re not from China.’

What Michael Luo experienced is ‘perpetual foreigner syndrome’, a problem facing many transnational individuals in everyday interactions, especially those who may look or sound different from the local majority. Back in 2002, Frank Wu, the first Asian American law professor at Howard University Law School, wrote specifically on how perpetual foreigner syndrome is instantiated through recurrent and seemingly innocent questions (which, admittedly, are much milder than what was hurled at Michael Luo):

Where are you from?’ is a question I like answering. ‘Where are you really from?’ is a question I really hate answering… For Asian Americans, the questions frequently come paired like that…. More than anything else that unites us, everyone with an Asian face who lives in America is afflicted by the perpetual foreigner syndrome. We are figuratively and even literally returned to Asia and ejected from America. (Wu 2002)

His point about what these questions can do strikes a chord with me. Having lived and worked in China and Britain and travelled to many parts of the world, I find questions like ‘where you are from?’ really difficult to answer. I never seem get it right and always end it up with the feeling that the self I present in my attempted answers is not real – it is fragmented some times, and rehearsed at others. If I say that I am from London, I know that the next question will be ‘where are you really from’. I have to look apologetic and confess that I ‘originally’ came from China more than 20 years ago and have lived in Britain longer than I had been in China. If I take the short-cut and tell people that I am from China, the next comment I am likely to hear is a compliment ‘but your English is so good!’. For a long time, I thought that this is just me, an applied linguist who is over-interpreting language use in everyday interactions, until I read Rosina Lippi-Green’s work on language, ideology and discrimination (1997/2012) and began to make connections with my observations on these instances of discourse in daily encounters and the existing studies including one of the strands of my work on Interculturality.

I refer to this kind of discourse that evokes or orients to one’s ethnicity or nationality either explicitly or implicitly as Nationality and Ethnicity Talk (NET). It includes questions or comments which, frequently occurring in small talk, aim to establish, ascribe, challenge, deny or resist one’s ethnicity or nationality. The questions and comments range from direct ones (e.g. ‘Where are your people coming from?’, ‘When are you going back?’, ‘Is it as hot as this where you are from?’, ‘What is it like back home?’ to more subtle ones (e.g. ‘Your English is so good!’). There is nothing inherently wrong with questions like ‘where are you from’. The question can be genuine – people would like to find out more about China, Japan or Korea or any other culture or they are simply interested in you as a person. But problems occur when those who are asking such questions appear to look for a certain answer and appear confused or disappointed when hearing an unexpected answer and those on the receiving end of such questions might have been asked the same questions 101 times. And of course, in Michael Luo’s case, it made him and his daughter feel like ‘foreigners’ in their own country

Despite growing acceptance of racial equality in post-industrialised societies, NET of the above kind reflects people’s hidden and flawed folk theories of race, reproduces and reifies cultural essentialism, and can result in exclusion and marginalisation of certain social groups. Jane Hill (2008) coined the concept of ‘folk theory of race’ to describe everyday assumptions that people have about race and ethnicity. Because the folk models or theories are often taken for granted, people tend to use them to ‘interpret the world without a second thought’. Folk theory of race can be in operation subtly and, on some occasions, it is almost invisible to those who apply it and/or those at the receiving end of it. Markus & Moya (2010) have unpicked the powerful, hidden, and flawed assumptions about the nature and meanings of race and ethnicity beneath the eight common conversations about race amongst American people. These include: ‘We’re beyond race.’ ‘Racial diversity is killing us.’ ‘Everyone’s a little bit racist.’ ‘That’s just identity politics.’ ‘Variety is the spice of life.’ ‘It’s a Black thing—you wouldn’t understand.’ ‘I’m___ and I’m proud.’ and ‘Race is in our DNA’. They argue that ‘these eight conversations give us the illusion of understanding, but they are narrowly based on limited, flawed, and of course, unstated assumptions … Also like stereotypes, these conversations are pervasive, they are difficult to change and they have powerful consequences for our actions.’

In my recently published article co-authored with Li Wei, we examine the significance of questions such as ‘where are you really from?’ in everyday conversational interactions. We discuss what constitutes NET, how it works through symbolic and indexical cues and strategic emphasis, and why it matters in the wider context of identity, race, intercultural contact and power relations. The discussion draws on social media data including youtube videos and a blog with the title of ‘It may not be racist, but it’s a question I’m tired of hearing’ by Ariane Sherine in the Guardian’s opinion column, Comment is Free. We argue that the question ‘where are you really from’ itself does not per se contest immigrants’ entitlement. However, what makes a difference to the perception of whether one is an ‘outsider’ as Michael Luo did – is the tangled history, memory and expectation imbued and fuelled by power inequality.

There have been reports of the increase in the number of racial insults at people who look and sound different since the EU Referendum. It is important that we pay closer attention to linguistic xenophobic, but it is equally important to be mindful of the significance of the more subtle ways of Othering as exemplified in NET.

Further reading:

Hill, Jane H. 2008. The everyday language of white racism. Malden, MA: Wiley-Blackwell.
Lippi-Green, Rosina. 1997/2012. English with an Accent. Language, Ideology, and Discrimination in the United States. London: Routledge.
Markus, Rose & Paula Moya (eds.). 2010. Doing Race: 21 Essays for the 21st Century. New York: W.W. Norton & Company.
Wu, Frank. H. 2002. Where are you really from? Asian Americans and the Perpetual Foreigner Syndrome. Civil Rights Journal Winter 2002. 16-22.
Zhu Hua and Li Wei (2016) “Where are you really from?”: Nationality and Ethnicity Talk (NET) in everyday interactions. In Zhu Hua & Claire Kramsch (eds.), Symbolic power and conversational inequality in intercultural communication, a special issue of Applied Linguistics Review 7(4), 449-470. The article can be accessed here.

Why do we feel different when switching languages?

Call me Madame

2 Replies

This post was contributed by Professor Penelope Gardner-Chloros, from Birkbeck’s Department of Applied Linguistics and Communication.

A few days ago, I phoned to arrange a repair to my washing machine. Having got through to the relevant person – a young woman – who could arrange the appointment, I was asked, as question number one, whether I was Miss or Mrs. This question is of course a standard one in this country, where ‘Ms’ has failed to catch on, unlike the position in the United States. As someone who teaches Language and Gender, I am aware that the way you address someone not only reflects the prevalent social structures, but also shapes and perpetuates them. Classifying women from the outset by their marital status is an instance of ‘everyday sexism’, as a certain massively successful web forum is called. Honestly, why should I have to disclose whether I am married or not to someone I have never met and will never meet, just in order to arrange a washing machine repair?

So I gave my standard reply: ‘Ms’. The reply to that was that this was not a title that would allow the relevant form to be completed. Since my (then teenage) son once filled in his title from a drop-down menu as ‘the Right Reverend Monsignor’, it was not clear to me why this form could not offer this third option. Irritated, I said “In that case please use ‘Professor’ “. I don’t like using my ‘rank’ outside academia, but desperate measures were needed. Once again, computer said no. This was an academic title, and so no use on the form. My Chinese horoscope says I am a tree, and trees do not budge. For a few moments it appeared that the washing machine would just have to keep leaking.

To break the deadlock, I launched into my normal little lecture given in such circumstances, about how there was no need for anyone to know my marital status, how this would not be required if I were a man in such a context, and how this was, as another teenager once said, SO unfair. I added that the person taking my details also being a woman, she ought to understand the need for equal treatment.

“Well yes”, she replied, getting tired of this difficult customer, “but it’s been like that ever since ever, so it’s a bit late to change it now”. I pointed out that in other countries, such as France and Germany, they had managed to make the change, and that now in France for official purposes everyone was “Madame” and in Germany everyone was “Frau”, the terms for “Miss” having been abandoned in both countries. I should also have pointed out that the company she was working for, Siemens, was German. On hearing this, her tone changed from one of mild irritation to an interested purr: “Oooh”, she said. “I’d rather like to be called ‘Madame’!”

A mini-triumph for the tree?

Other blog posts by Professor Gardner-Chloros

Language, identity and a political hot potato

1 Reply

This post was contributed by Professor Penelope Gardner-Chloros, of Birkbeck’s Department of Applied Linguistics and Communication.

Blogs do not usually start with quotations from the Bible, but this one epitomizes the link between language, identity and danger that I want to discuss:

Gilead then cut Ephraim off from the fords of the Jordan, and whenever Ephraimite fugitives said, ‘Let me cross,’ the men of Gilead would ask, ‘Are you an Ephraimite?’ If he said, ‘No,’ they then said, ‘Very well, say “Shibboleth” (שבלת).’ If anyone said, “Sibboleth” (סבלת), because he could not pronounce it, then they would seize him and kill him by the fords of the Jordan. Forty-two thousand Ephraimites were killed on this occasion.

—Judges 12:5–6, NJB

Applied Linguistics, as distinct from more theoretical branches of the discipline, addresses real-life problems in which language plays an important role. As a linguist, I can attest that no day goes by without such an issue coming up in the press. I have several box files full of cuttings of articles which I use in my teaching, on topics ranging from the apparently inoffensive – e.g. the revival of ‘dead’ languages such as Manx or Cornish, to the highly political, such as the relationship between the use of minority languages, such as Catalan, and political separatism.

All of us are identified – and often judged – on the basis of our language, dialect or accent. The gruesome ‘beheading’ videos recently released by members of Isis were doubly chilling for Londoners, because the executioners in black hoods had unmistakable London accents, and sounded like the young people you hear on the bus on the way to work.

I want to talk about another recent example which has been in the news, and which shows how language interacts with some very practical issues with serious human consequences.

No-one in the UK needs to be told that the issue of immigration has been in the news on a daily basis for the last few weeks or months. Oddly, this is not because of any drastic change in the situation, but principally because the issue has become a populist rallying cry for politicians who, though they might not put it that way, wish to convince British people that they are going to ringfence the country’s wealth for those same British people – and prevent undefined side-effects like ‘overcrowding’ in the process. This ‘promise’ has proved so popular that politicians from all the main parties have jumped on the bandwagon. Despite clear evidence that immigrants in fact contribute positively to the economy, nothing wins votes like telling people they will get a bigger slice of the cake.

The detailed arguments, and the distinctions between different categories of immigrants, become obscured in this rhetorical assault. This is not the place to rehearse them, but there is one category of (potential) immigrants who deserve some special attention: asylum seekers, i.e. people who come to this country because they claim to be in direct danger, or subject to persecution, in their country of origin. Basic humanity dictates that such cases are dealt with quickly and fairly, and that such people are distinguished from, say, economic migrants, who come for a better life but who are not actually in fear of their life.

But how do we decide whether the asylum seeker is to be believed or not? An important aspect of the decision involves finding out where they actually come from. Do they come from South Somalia, for example, in which case it is beyond question that they should be granted protection, or North Somalia, in which case protection is not automatic? Since such people rarely have any documentation when they arrive, linguistic agencies are employed by the Government to judge their region of origin on the basis of an analysis of their speech. So far, you may think, all well and good – this could be an effective method.

Unfortunately it has now emerged that these agencies employ people who are not qualified to take these specialized decisions, and who in some cases are totally bogus. Amongst other problems, they ignore Lesson 1 in Sociolinguistics, which is that national and regional frontiers rarely coincide neatly with languages; Lesson 2, that in circumstances where they are being judged, people will speak the way they think they are expected to speak, not the way they speak naturally – so in this case, they may try to use a standard or a national language instead of their own dialect; and Lesson 3, which is that each individual has more than one way of speaking, in fact a ‘repertoire’ which may include different registers, different dialects, and different forms of mixed or intermediate varieties. In such a delicate matter, with life or death consequences hanging on the decision, a very high degree of linguistic expertise is required to do this job properly, and several other factors need to be taken into account apart from the linguistic analysis. Imagine the outcry if we employed food inspectors with bogus or insufficient qualifications to vet the food which we import.

If you would like to read more about these issues, the links below may be of interest. And next time you yourself are in some way judged by the way you speak, think of those who are being sent back to war zones, or to face FGM or worse, because a bogus or under-qualified linguistic ‘expert’ decided they did not come from the region they claimed.

Institute of Race Relations, “Language testing of asylum claims: a flawed approach“
The Independent, “Hundreds of asylum seekers ‘wrongly deported’ on drug smuggler’s evidence“

What the **** is linguistics?

5 Replies

This post was contributed by Professor Penelope Gardner-Chloros, from Birkbeck’s Department of Applied Linguistics and Communication.

Excuse the *rude* title of this blog – I shall have more to say about why it is rude – even though it actually only contains a few stars – in subsequent blogs.

To begin, though:

If, like me, you are a teacher of linguistics, there are two questions which people will invariably ask you:

1. What IS linguistics?
and
2. What languages do you speak?

Those are the questions I want to write about today.

In answer to the first, linguistics is the study of language, no more, no less. Since language is (almost) as fundamental to the human race as breathing, it is probably quite important to know something about it. Linguistics is actually a whole collection of subjects, from the highly scientific – like whereabouts in the brain is the language faculty located? – to the philosophical – like why does a sentence mean what it means? – to the strictly structural – like what is the difference between a noun and a verb, and does the difference exist in all languages? – to the sociological and politically relevant – e.g., in what way are women linguistically at a disadvantage in our society, and in others, compared with men?

The answer to the second question is, strictly speaking, irrelevant. Even if I only speak one language, my ‘mother tongue’, I am perfectly able to study the various issues mentioned above. Chomsky, considered the originator of modern linguistics, thought that the ideal way to study language was by analysing the productions of ‘ the ideal speaker in a homogeneous community’. Studying such an ideal speaker would allow us to uncover principles of language which underlie all languages, the basic principles of the human language faculty.

However, more recently, more and more linguists have started to realize that what is most ‘universal’ about language is actually its diversity. The fact that people speak different languages, and within languages different dialects, and speak in different ways depending on their circumstances, their topic, their interlocutor, etc. is by far the most striking fact about human language, more so than the fact that all languages have something like a verb/noun distinction.

In coming blogs I will discuss the relevance of diversity and variation in language through examples mainly taken from the news and current affairs. I will try to show that linguistic questions concern us all, and hopefully convince you that there are rational and irrational ways of finding the answers to linguistic questions. Just as the fact that we all breathe does not make us experts in respiratory medicine, so the fact that we all communicate through language does not qualify us to pronounce on linguistics – though if you read the letter pages of any major newspaper it is stunning how many self-appointed experts on language/ grammar/spelling/usage there seem to be!

Birkbeck Perspectives

Birkbeck experts and students share their opinions on a diverse range of thought-provoking topics.

Tag Archives: language

Google’s new NMT speaks its own language

Why do we feel different when switching languages?

Other blog posts by Professor Penelope Gardner-Chloros:

Other blog posts about linguistics: