What was the surveillance industry like before the advent of computer vision and neural networks? Neural network algorithms are a recent occurrence and therefore we first need to articulate how the surveillance industry functioned before in order to perceive the progress that neural networks have brought into the industry.
A typical CCTV infrastructure involves a collection of CCTV cameras which are linked to a Digital Video Recorder (DVR) or a Network Video Recorder (NVR). The cameras are linked via the ethernet using a switch. The software in DVR and NVR controls the recording of the number of videos and the timing of videos and the videos are stored in a hard disk. There is a power supply required for this set-up. DVRs convert the analog footage to digital, which helps to extend storage capacity, makes it much easier to search archived footage, and also allows users to stream video over a network for remote viewing from multiple locations.
Traditional surveillance algorithms relied on the differences in pixel intensities in order to sense a movement or an event. For example, if you take up a surveillance set-up established in a park, the surveillance algorithm tracks the trees, people, animals and even shadows in the same manner with pixel intensity changes. There is a notion of well-defined objects and therefore any changes in intensity will set an alarm. To elaborate this, let us take an example of a tree’s shadow, the changing winds and shaking leaves will also cause a movement in the shadows. This will trigger a pixel intensity change and therefore will be perceived as an event by the camera. Such triggers could be false and not significant to our requirement. Even changes in times of the day or climatic changes can introduce many pixel changes in the videos or objects which can then lead to false positive results for the alarms. In order to lessen the false positive, separate complex rules and region-wise segregation of monitoring must be introduced to ignore certain areas and to consider certain areas. All these make detection very false and unreliable. However, we should note that such algorithms end up being light weighted as it relies only on pixel intensity changes. Now, how did neural networks change all that? Did it improve the false positives? Did it bring any efficiency in the algorithms?
Well, the advent of neural networks used efficient algorithms which brought in the notion of definite objects and its recognition. The algorithms could be trained using a lot of data and a model was developed with which a clear cut recognition of well-defined objects was made. For example, a large collection data with pedestrians was fed and it was trained for detecting a pedestrian. This produced a model which was then used to detect pedestrians flawlessly in videos. Neural networks, therefore, brought in a well-defined notion of a physical object be it a person, cat, dog or any other object. This technology was quite robust to any changes in the scene. The occurrence of false positives was drastically reduced. The way the neural networks work is that they are able to understand small changes in climate or any other scenario including lighting, noise, occlusion and it precisely identifies the object in question. However, this brought in an additional weight of being compute-intensive and the hardware required for the CCTV stream processing became very specific in nature along with its power and heating considerations.
Having realized the advantages of using neural networks for our surveillance applications, we would definitely want to use this technology and improve our efficiency. If we have to move to newer technology, then what do we do with our existing investment on the CCTV infrastructure? It is unimaginable to throw away your existing infrastructure and give way to a completely new set with neural network compatibility. To change every hardware to suit the neural network algorithm is very cost-intensive and is not feasible. There is a way around this as well, we can retrofit our existing infrastructure to allow the neural network algorithm to do their job. There are three ways about it
In edge retrofitting mechanism, our traditional CCTV camera will be made into a smart camera, the capture now will become an intelligent capture. A smart camera will know to identify the objects and therefore be more effective. For example, we can use NVidia Jetson edge GPU devices which are capable of processing video streams, run object finder algorithms, and create a neural network model which can be run on an edge device. There can be 2 or more smart cameras to do the job. The content from the smart camera can be captured based on the defined rules. For example, if you are interested in pedestrians from the time range 5 PM - 8 PM, we can define the appropriate rules. The camera records the occurrences in that time period and the rest of the time is ignored and if there were no pedestrians, then there will be no recording. In such a case, the accuracy is high and detection is close to 95 % and the false positives will be less. There will not be any need to capture video for a long time and only rule-specific capture can be taken into consideration. This is the way edge retrofitting is performed.
Another way for retrofitting the existing infrastructure is server retrofitting. The primary function of a DVR or an NVR is to collect the video streams and store it in a hard disk. With the use of server retrofitting, we can add an NVidia GPU based server, which has a rack of multiple GPU processing cards. The GPU processing cards on the server-side are more powerful than what we have for edge retrofitting. About 10 to 12 cameras can be processed on an entry-level GPU card. We can also set up rules for every camera to record specific events that are occurring. The tracking is accurate and also we reduce the amount of video needed to be tracked. In a traditional set up we end up looking at hours of footage with little or no data in them. This can be reduced and we have to process only the specific videos that are recorded based on the rules that we have set up.
This is a combination of server and edge retrofitting, wherein some part of it is edge cameras and some part of it is managed by server retrofitting. In most of the situations, we have noticed that server retrofitting is more popular and widely used with lesser changes in the hardware infrastructure.
We can monitor events in two ways, either after the event has occurred or while the event is occurring live. Post-event analysis is generally after the event has occurred, like in the case of when the courier was dropped in the house or the traffic patterns in the city. We analyze the already stored videos and analyze from the stored videos and generally, a server CPU/GPU is used for the computation. Here, the processing latency is not that important as we are searching for an event that has already occurred and analyzing the same from stored videos. The camera used in this case can be an RGB camera for day time and an IR camera for the night time.
However, in the case of live analysis or live event monitoring, latency becomes very important. We need to analyse a live video feed, analyze the event that has occurred and immediately send alerts for the event. In this case, the live stream is analyzed based on the event that is being tracked, for example, a live feed of a facility where an intruder is being detected, then once the event is detected an immediate alert is sent to the security for monitoring the perimeter of the facility. In such a scenario we used edge retrofitting as we cannot rely on a server for this analysis where processing latency is very crucial. Every edge camera will be smart and perform the analysis and send the notifications based on the event. As we are recording 25 frames per second, the processing latency is expected to be less than 40 ms, the video streams have to be analyzed, run across a neural network model and the alert needs to be sent downstream. This process will prove to be inefficient with a server and therefore edge retrofitting will be a good match for the same. As in the case of server retrofitting, here also RGB camera is used for day time and IR camera is used for the night time.
Retrofitting has paved a way for using the existing CCTV infrastructure with a few enhancements, thereby completely changing the efficiency and intelligence of the surveillance applications. Not changing much of the infrastructure will enable businesses both small and big to adapt and experience the benefits of the advancement in computer vision. Smarter surveillance systems that can be built with minimal changes to the existing set-up is definitely a tempting option to take up, isn't it?