Industrial Machine Learning
The computing power of microprocessors and industrial PCs is constantly growing. This not only makes industrial solutions nicer and faster, but also increasingly opens up completely new possibilities and strategies. Machine learning is on everyone's lips today. The following self-experiment shows possibilities.
Newspapers and magazines are full of talk about so-called Artificial Intelligence (AI). A wonderful golden future is predicted or apocalyptic bottomless pit. The first category includes the repeatedly praised intelligent refrigerators that will one day automatically reorder yogurts and salads that have run out (or expired); the second category includes Siri, Alexa & Co. that are already listening to our most intimate secrets on behalf of the octopus-like gatekeepers of the Internet - Google, Amazon, Apple, Microsoft. In addition to these questionable and sometimes frightening developments, it is easy to overlook the fact that so-called artificial intelligence, or rather machine learning, has long since arrived in industry as well. Here, interesting, novel possibilities open up for smart manufacturing, quality control, predictive processes, pattern recognition, etc., which would be difficult to realize with conventional methods, if at all.
Machine learning is used for complex tasks, often for optical recognition, classification, and determining the location and orientation of objects of all kinds. In addition to technical components, these include people, faces, gestures, handwriting, etc. In autonomous vehicles, for example, it is used to detect and localize obstacles, other vehicles, passages, markings, etc.
Today, neither a team of scientists nor a powerful computer center is needed to solve such tasks. A simple example will illustrate this. However, the automatic recognition of colored pencils, screws or gears in a camera image would be too simple. That is why we have set ourselves a somewhat trickier task: A random person sits at a desk with a PC. He or she is performing one of four possible activities, namely (1) writing on paper, (2) typing on the keyboard, (3) handling tools, and (4) dozing off. An artificial neural network is supposed to recognize each activity out of the four based on a digital photo. Complicating the task is the fact that each person is wearing individually patterned clothing and that there may be other people in the background of the images in question.
Pictures of the different activities for training the classifier. Left: Category Screwing, right: Category Writing.
In the recent past, such classification tasks have been successfully solved using deep learning. This involves the use of a large neural network. Since this network usually contains many hundreds of millions of parameters, thousands or even millions of training images are necessary for its training. For each of these images it is necessary to specify which category (in our example: which of the four activities) is represented on it. In most practical cases, obtaining and classifying such a set of images is simply too time-consuming and expensive.
To get around this, we used a pre-trained network for our task. Such a network is already trained to separate, say, 1000 different features from pictures. So, one uses the neural network only as a so-called feature extractor. Instead of the complete picture, only these 1000 values are used for the classification. This way, the complexity of the problem is reduced to a manageable fraction.
The processing of the feature vector is done for example by a support vector machine, a random forest classifier or a second, smaller neural network. For training this second stage, a few hundred to a few thousand training pictures are sufficient. For our task, we built a camera setup, and prepared a small utility to assign to each picture, besides the class of activity, some more information, for example the name of the person in the picture. With this setup, nearly 900 pictures were taken in one day with nine different people in the four activities, an average of 25 pictures per person and activity. This trained the classifier.
During the verification of the system, these pictures may of course not be used again. Instead, new pictures were taken. If these test pictures contain persons who were not in the training set, the system achieves a hit rate of 80%. If, on the other hand, a person is tested who was already present in the training pictures, the classifier achieves a reliability of about 95 %.
This follows: Despite the modest size of the training set, the classifier already achieves quite good hit rates. By extending the training set to about 4000 images, either by augmentation - i.e., by changing the existing pictures - or with additional pictures of the same persons and/or with additional persons, the hit rate can easily be increased up to about 100%. It should be noted, however, that even the so-called human performance does not usually reach 100%, because there are always pictures that cannot be clearly assigned.
© SSP. All rights reserved.
built by Andy