Scientists are studying your Twitter slang to help AI
- A group of mathematicians from the University of Vermont used Twitter to examine how young people intentionally stretch out words in text for digital communication.
- Analyzing the language in roughly 100 billion tweets generated over eight years, the team developed two measurements to assess patterns in the tweets: balance and stretch.
- The words people stretch are not arbitrary but rather have patterned distributions such as what part of the word is stretched or how much it stretches out.
What? whaat. WHAT? Whaaaattt?
While all of the above are expressions of confusion, you understand them to mean slightly different things. That’s based upon the way you imagine the word to sound signified by the repetition of or emphasis put on certain letters. The underlying meaning imbued within our vernacular, slang, and deliberately misspelled words is how we lace our digital communication with human emotion.
Which has, coincidentally, proved to be one of the major challenges for language-processing artificial intelligence. But scientists are trying, and they’re studying our Twitter lingo to bring computers up to speed on how humans really communicate.
Photo credit: Dole777 / Unsplash
Over the last two decades, social media has provided scientists with a trove of free information about human behavior and language. A group of mathematicians from the University of Vermont used Twitter to examine how young people intentionally stretch out words in text for digital communication. They created a method to essentially quantify the semantic nuances in between stretched words, like “right” vs. “riiiiiight,” with the aim to teach future AI algorithms human digital colloquialisms.
“Written communication has recently begun encoding new forms of expression, including the emotional emphasis delivered by stretching words out,” said Chris Danforth, professor of Mathematics & Statistics in the Vermont Complex Systems Center and member of the research team behind the study.
In their study, published last week in the journal PLOS One, the team analyzed the language in roughly 100 billion tweets generated from 2008 to 2016. They developed two measurements to assess patterns in the tweets: balance and stretch. For example hahahaha would be considered a stretched world high on balance while a term like wtffffff has stretch but little balance as only one letter, f, contributes to the stretchiness. This means to put emphasis on the world abbreviated by the letter “f”.
“With so much communication happening electronically these days, we’re all trying to find ways to convey emotion through text. Emojis are helping, but the visual effect of 30 consecutive vowels in a curse word turns a bland profanity into a form of art,” Danforth said.
Interestingly, the use of elongated words was found across languages. For example, “kkkkkkk” signifies laughter in Brazilian Portuguese while “wkwkwkwkwkwk” expresses it in Indonesian, according to the researchers.
Ultimately, this project could help artificial intelligence algorithms understand critical intrinsic meanings contained in the idiosyncratic variations in our communicative text or other linguistic symbols, such as punctuation and emojis.
Dictionary definitions hardly reflect the way that we actually communicate with one another digitally. What the researchers found, though, is that the words people stretch out aren’t arbitrary. Rather, they have patterned distributions such as what part of the word is stretched or how much it stretches out. Colloquial digital language is, after all, a system of symbols and for it to transfer meaning we must all be “in” on the patterns.
This research suggests that by gaining understanding into stretched words used on social media opens more doors to helping AI better understand our slang. Tools and methods were developed that could be useful in future studies, for example investigations of intentional mis-typings and misspellings.
What benefits come from AI algorithms better understanding our digital lingo? For one, it’s possible that new tools could be applied to improve natural language processing, search engines, and spam filters.
“We were able to comprehensively collect and count stretched words like ‘gooooooaaaalll’ and ‘hahahaha’,” the researchers said in a press release, “and map them across the two dimensions of overall stretchiness and balance of stretch, while developing new tools that will also aid in their continued linguistic study, and in other areas, such as language processing, augmenting dictionaries, improving search engines, analyzing the construction of sequences, and more.”