This technique is a combination of two powerful machine learning algorithms:
– convolutional neural networks are excellent at image classification, i.e., finding out what is seen on an input image,
– recurrent neural networks that are capable of processing a sequence of inputs and outputs, therefore it can create sentences of what is seen on the image.
Combining these two techniques makes it possible for a computer to describe in a sentence what is seen on an input image.
_____________________
The paper “Deep Visual-Semantic Alignments for Generating Image Descriptions” is available here:
A gallery with more results with the same algorithm:
You can train your own convolutional neural network here:
The source code for the project is now available here:
Subscribe if you would like to see more of these! –
The thumbnail image background was made by Georgie Pauwels (CC BY 2.0) –
Splash screen/thumbnail design: Felícia Fehér –
Károly Zsolnai-Fehér’s links:
Patreon →
Facebook →
Twitter →
Web →
source