Emojis: Kids may love their simplicity, but programmers will loathe their complexities.
Last month, the Instagram photo-sharing service started recognizing emojis in its hashtag searches, making the company the first major social networking service to offer this capability. A user could affix a sprightly emoji to a photo hashtag so the snap could be found by other users searching for that emoji. The Internet rejoiced.
Now, one of the Instagram engineers responsible for this technical feat has shared the company's approach in a blog item posted Wednesday that should be perused by any developer looking to outfit a social Internet service or consumer app with similar emoji goodness. Turns out that supporting the little digital icons is no easy task.
"Identifying characters can be difficult across programming languages. Only by parsing the standard, finding character variations and understanding language differences do they become possible to support," Instagram engineer Piyush Mangalick wrote in the new post.
While elders may bemoan emojis' putative deleterious effect on language, one thing is for sure: The youth love them. Today, almost 60 percent of user text generated on Instagram contains emojis. Among Instagram's 300 million users, emojis are now more widely used than acronyms. LOL.
First popularized in Japan during the last decade, emojis convey a wide range of subjects and emotions through the use of simple symbols and pictographs, usually fitted on a 12-by-12-pixel grid. They are often used as shorthand to eliminate the laborious typing of words on small devices. The Unicode standard for encoding the world's languages on computers adopted a set of 1,282 emojis in 2010, which paved the way for their widespread use on Apple and Android devices.
Including emojis in Instagram's hashtag index at first seemed like a simple task. With Unicode, each character — be it a letter, symbol or emoji — is represented by a string of hexadecimal numbers, which a programming language or operating system can translate into the appropriate character by using the Unicode guide.
Unfortunately, creating a single way to search these raw Unicode strings across different platforms was not possible, Mangalick said. Emojis used a subset of Unicode, called UTF-16, that allows the numeric strings to be of differing lengths. That made them tricky to parse, given that different programming languages used different escape keys, or markers, to signify the end of the numeric string. Additionally, some emojis required two strings of numbers.
Apple muddied the waters further by offering users the ability to encode some emojis in various colors, which resulted in non-standard strings. Android also had a set of non-standard emoji encodings. For Instagram to use emojis correctly, an Android device had to recognize an iPhone emoji, and vice versa.
Sign up for CIO Asia eNewsletters.