What Is Labellio?
Labellio is a web service that lets you create your own image classifier in a few minutes without any programming. It is very important to give a training image dataset that describes well what you want to recognize, and Labellio helps you build up such dataset. After setting up the training dataset, you get a classifier you can use in your system for your purpose.
Here we walk you through the complete flow of building your own classification model.
Labellio supports OAuth signin with GitHub or Google. Go to Login page and click one of the buttons to be authenticated via OAuth mechanism.
This page will show all your classification models after you create ones. Click the
Create Model button to create a new classification model.
Now we educate Labellio engine what you want to recognize. Type your model name in the
Name text box, then click the
Add data button.
Labellio supports three different ways to input training dataset.
- Upload ZIP
- Upload TSV
- Image Search
You can upload a zip file that contains your trainnig images in the following rules.
- The immediate folder name is used as label of the images under it, and the images on the top level are used as unlabelled images.
- The maximum size of zip file is 512MB. The maximum number of files in the archive is limited to 10,000.
- Only ASCII characters are expected for the name of files and directories in the zip file.
So the folder structure should look as follows.
dog/dog_01.jpg /dog_02.jpg /dog_03.jpg /mydog.png cat/cat_01.jpg /cat_02.jpg /cat_03.jpg /yourcat.png bird/bird_01.jpg /bird_02.jpg /bird_03.jpg unlabeled_01.jpg unlabeled_02.jpg unlabeled_03.jpg
You can upload a TSV file that lists up image URLs associated with labels in the following rules.
- The images should be accssible from the Internet.
Content-typevalue in the HTTP header should return "image/jpeg" or "image/png".
- The character encoding of this TSV file must be in UTF-8 if it contains non-ASCII characters.
- HTTP and HTTPS protocols are supported.
- The maximum number of lines is limited to 10,000.
The TSV file content should look as follows.
dog<tab>http://url.to/image dog<tab>http://url.to/image dog<tab>http://url.to/image cat<tab>http://url.to/image cat<tab>http://url.to/image cat<tab>http://url.to/image http://url.to/unlabeled_image http://url.to/unlabeled_image http://url.to/unlabeled_image
There are two different Search engines that are supported.
You select one of these engines and type search keywords in the
Labels input box, separating by hitting a tab, comma or enter key. The number of retrieved images are 50 for each keyword by default and can be configured up to 100. The keyword is used as a label for those images.
The following rules are applied regarding the format of images.
- JPEG and PNG are supported.
- All images are resized to 227x227 in the system, so higher resolution does not contribute to the result very much.
- No EXIF rotation is handled in the system, so please rotate it before uploading.
After the training data is retrieved, click the
Continue button. You will see the labelling screen.
Initially the available labels from the given dataset are displayed. If you want to re-label images, simply delete these labels, and enter your new labels. As far as there are unlablled images, you enter into the labelling screen.
You manually label images by clicking or dragging each image. The current label name is shown at the left top of the top box. Make sure all images to be labelled are placed in the top box and the others in the bottom box. Click the
All Ok button, until you go through all the unlabelled images. As you start labelling, Labellio learns how images should be labelled.
After labelling all images, you are led to the final training process automatically. This step may take several minutes. Note that the training process continues even if you close your browser.
Coming Soon ...
Once the training completes, you can see how the classifier works from the
Test Model tab.
You can either upload the raw image file or type URL in the input box. The same rules for Image Format apply.
You can download your classification model by clicking the
Download Model button in the
Detail tab. There are two files to download.
- Caffe Model
These links are valid for an hour.
Model File Format
The model files exported from Labellio can be used by the open source Caffe framework.
The Caffe Model file contains the following files in tgz archive.
|caffemodel.binaryproto||caffe model file|
|deploy.prototxt||caffe's network definition file|
The Caffe model file and mean file what you give to Caffe program.
Labellio CLI allows you to utilize trained models on your environment.
Installation of Labellio CLI
Labellio CLI is an opensource project which can be installed by
pip command. It's also available as Docker image as well as AMI on AWS.
Usage of Labellio CLI
Labellio CLI classifies images by using the caffe model which is trained and downloaded from Labellio.
Store image files under a directory.
$ ls images alpaca1.jpg alpaca2.jpg sheep1.jpg sheep2.jpg
Extract the Caffemodel file which is downloaded from Labellio.
$ ls model caffemodel.binaryproto labellio.json mean.binaryproto deploy.prototxt labels.json
Run labellio_cli command.
$ labellio_classify model images ... images/sheep2.jpg sheep [ 0.02713127 0.97286868] images/sheep1.jpg sheep [ 7.87437428e-04 9.99212503e-01] images/alpaca1.jpg alpaca [ 0.99114448 0.00885548] images/alpaca2.jpg sheep [ 0.44422832 0.55577165]
How to read the chart of training progress
Low both loss_0 and loss_test and high accuracy.
The class is something obvious or already known by the base model.
loss_0 goes down and loss_test goes up
It's called over-training. The model is fitting too much with the training data and doesn't work well with other data. Increasing number of training images may help.