Microsoft talks of Automatic Image Captioning software system

The Microsoft Research Team has always been known for its inventive applications and a very innovative piece of software is soon to be added in its lineup. The team is busy developing an Automatic Image Captioning Software system these days. This software will automatically locate an image from the database and generate a descriptive and realistic caption.

System says: “A cat sitting on top of a bed” Human says: “A person sitting on bed behind an open laptop computer and a cat sitting beside and looking at the laptop screen area”

System says: “A cat sitting on top of a bed”
Human says: “A person sitting on bed behind an open laptop computer and a cat sitting beside and looking at the laptop screen area”

The team developing this piece of software includes some computer vision experts, speech and language experts and researchers with expertise in machine learning and machine translation.

The software works like a machine translation system which translates pixels into a language (English). The machine experts in the team are using Bilingual Evaluation Understudy (BLEU) which is an algorithm to evaluate the quality of the text translated by a machine. The machine-translated text is then compared to that of a human; closer it is to a human translation, the better it is.

John Platt, Deputy Managing Director and Distinguished Scientist at Microsoft Research, writes in his blogpost that they got the highest possible BLEU score for their Automatic Image Captioning Software system.

He says, “I’m happy to report that, in terms of BLEU score, we actually beat humans! Our system achieved 21.05% BLEU score, while the human “system” scored 19.32%.”

The team also conducted a blind test and asked people to compare the captions generated by their software and those written by humans. For 23.3% of the test images, people said that the captions generated by the software were better than the human captions.automatic image captioning

The software first uses the word detector to detect the words objects in an image in a no particular order and then generate sensible sentences using the language model. These sentences are then re-ranked and the best suitable caption is assigned to the image.

The team is still working on the program and will soon create a system that can automatically generate a descriptive caption for an image as accurate as done by a human. Know more about the system from official blog post.

Download this VPN to secure all your Windows devices and browse anonymously
Posted by with Tags
Shiwangi Peswani is a qualified writer and a blogger, who loves to dabble with and write about computers and the Internet. While focusing on and writing on technology topics, her varied skills and experience enables her to write on any topics which may interest her.