When a human eye looks at something, say for example a road crossing where four to five people are crossing the road, what we see and perceive is quite a lot. We see the road, vehicles on it, the pedestrian crossing on the road, people who are crossing including what they are wearing and how they are walking and gesturing etc. There is no end to what a human eye can perceive. What we essentially do in the field of computer vision is to bring in this capability to a computer or a machine wherein it also observes the scenario, and tries to understand, perceive and detect a road, vehicles, the crossing, people, their faces, gestures etc. We have come a long way, thanks to advances in the algorithms and the immense availability of data to analyze. Computer vision and its applications make our lives easier by bringing automation to many tasks that are strenuous, laborious and labour intensive. One of the driving factors behind the growth of computer vision is the amount of data we generate today that is then used to train and develop models. However, to design a computer vision product, we need to take many factors and options into consideration to suit the business use cases. In this article, we will discuss some of the factors that affect the decision making aspects of a computer vision product. We will also explain some business use cases where these decisions get played
The choice of a computer vision algorithm becomes imperative in the design of the product as this is going to define as to how the product is going to be developed and implemented. Algorithms are of type feature-based featureless, where the feature-based algorithms either apply filters or feature extractors and the featureless algorithm rely on neural networks for the data analysis.
The algorithms use filters to limit the data in the image so that, what they need can be extracted effectively. Images are sometimes filled with signals and noise which is a hindrance to the information that we are required to extract, so applying these filters, reduce the detailing in the image thereby making decisions and analysis quicker. Some of the filters that algorithms use are Gaussian filter, laplacian, difference of gaussian, Sobel etc. Sobel, for example, highlights the vertical details or aspects of an image and other features are limited or blurred so that further analysis can be performed and relevant data can be extracted effectively. Such a filter-based algorithm is very simple to use and does not require any training or machine learning model.
One of the other feature-based algorithms are feature extraction algorithms like canny edges, hough lines, histogram of gradients etc. Canny edges are widely used for detecting edges, they do this by detecting changes in intensity in the images and identify them as edges. They create a pattern or a signature on this image in order to make what we want to detect easier and faster. Similarly, histogram of gradients algorithm as the name suggests, tries to establish a highlight or difference by measuring the difference in intensities. Where there is no change in intensity, the image is rendered black like a wall or a flat surface in the same block colour and therefore will be eliminated during analysis. Once the gradient patterns are found, a sliding window mechanism slides through the whole image to match this pattern thereby finding the object or a face that we want to detect. This is how a face is found on the image. Such algorithms are more complex than using filters and will require some training and modelling. Modelling for this will require thousands of images and hours of training to find a suitable model to detect a face.
Feature-less algorithms involve constructing a neural network and giving a set of distinct inputs which then allows the neural network to rearrange itself in a manner that the given inputs are highlighted or brought out. They are widely used in object detection or image segmentation etc. These algorithms are used in cases where accuracy is of utmost importance. It comes with a high computation cost as it involves performing billions of operations to detect the match and the object we are looking for. They are also data-intensive because, in order to detect this object with accuracy, tens of thousands of images need to be fed to train and develop an accurate machine learning model. So, unless your business use case warrants a featureless algorithm, it is better to stick with the low intensive feature-based algorithm. Some of the deep neural networks include AlexNet, DarkNet, Inception etc. To conclude this, some of the questions that need to be asked include whether we have enough data and computation power to choose a featureless algorithm for a product design.
Therefore, the choice of which algorithm to use depends on the product and its application. Multiple questions need to be asked before arriving at a decision. Are we going for the simple edge detection or boundary marking in images or videos or are we going to a facial detection, gesture detection? Do we have enough data to train a model? Do we have a computation power and data enough to apply a featureless algorithm? Once we seek and find answers to these questions, then we can be confident of the algorithm, we would want to use in our product design to build it efficiently. Do you have a similar problem statement? We do have experience in guiding many clients and business to make the right hardware choice for their products. Do reach out to us and we will guide you through every step of your product design and process.