Below is the general workflow of Salmonvision explained
The hardware for the RGB video system is provided by our partners Ocean Aid (https://oceanaid.ca/) in Canada and Whooshh Innovations (https://www.whooshh.com/) in the USA. For acoustic imaging we currently support Sound Metric ARIS but are currently evaluating opportunities to use lower cost side scan sonar units for some monitoring applications.
Data is transferred from the camera to a Raspberry Pi which performs motion detection and file management, cutting continuous video generated at the site into a set of more easily reviewable clips where motion was detected by the camera. Subsequently these motion clips can either be a) transferred to the Cloud for analysis or b) processed on the edge with a NVIDIA Jetson GPU microcomputer. Video clips and associated AI counting results including file metadata and detections are then uploaded to our secure cloud storage, typically over an onsite internet connection (either satellite or broad band) for review in the Salmon Vision web app. In cases where internet is not available on the site, data can be uploaded in batches using an upload script provided by the SV team whenever the hard drives are collected and transfer is possible. Since onsite internet access allows our team to monitor device uptime and performance, and check data uploads throughout the season, we strongly recommend onsite internet connectivity to ensure optimal performance of SV tools running on the edge.
For RGB edge processing, the data are first analysed to select periods with motion (e.g. fish swimming by) which reduces the computing time required for subsequent detections as video without movement is discarded. Continuous video is backed up on onsite hard drives, in the event that partners require additional review of continuous video data. This approach can produce near real time data on salmon numbers if the detection (next section) is done at the edge. However, the smaller less GPU intensive models deployed on edge devices may be less accurate than those available in the Cloud.
Sonar data is challenging to analyse as there often is a considerable amount of noise present generated by environmental conditions (flow speed, turbulence, particles, etc). A second challenging factor is that, unlike RGB where three sets of information is available (red, green and blue channels), sonar only provides a measure of intensity of the strength of the return signal. An important first step is to convert the proprietary .aris file into a .mp4 format and reduce jitter. Subsequently various different additional steps can be done to separate the noise
With the production of motion clips and noise filtering pre-processing done the next step is to detect the objects of interest, i.e. salmon (and other fishes / aquatic animals). Computer-vision algorithms for RGB video monitoring have been trained to recognise the different fish species. This is done by providing imagery with annotations (rectangular ‘bounding’ boxes) of the fishes. These labelled data sets are crucial for the initial training of the model, but also at later stages when more labelled data (through data reviewing, see below) becomes available to retrain the models. It is essential that the labels are correct to avoid including training data in future model retrains that may confuse the algorithms ability to accurately detect species. Unfortunately only small amounts of wrongly labelled data can potentially have a large negative impact on performance, elevating the importance of your data review.
Once fish are detected, they have to be tracked through the video. This is essential for counting, because a fish detection moving upstream across the entire screen (a count) is very different from a fish emerging and vanishing at e.g. the downstream end of the set-up. We use a ‘cross-the-centre-line’ approach where the fish has to move from one side across the centre line, continue and disappear in the opposite direction to be counted. This movement can be up- or downstream, and upstream direction is configured into the web app for each project to produce a count of total upstream migration for each video clip.
The results of the detection and tracking in the previous step needs to be checked for mistakes that the AI algorithm inevitably makes. At the detection level these can be false positives (a detection where there is no salmon) or false negatives (no detection where there is a salmon). A second type of errors are classification errors, where the detection is assigned the wrong species label. In addition, if a fish is detected after it has already passed the center line, or the detection track ends before the fish crosses the center line, these will require corrections by the reviewer to produce an accurate count. The extent of these errors can be estimated through reviewing a subset of the data. This is an important quality validation step through inclusion of ‘experts-in-the-loop’.
In SV this is done typically by reviewing a certain percentage of the videos and checking whether all fish were detected and assigned to the right class. This review can be done very accurately (as explained later) after which the data can be used as new training data to further improve model performance. By comparing the results pre and post review we can estimate the different errors made by the algorithm, and produce corrected counts as detailed in the next section.
Subsequently these errors can be used in a statistical analysis to adjust the results from the detection algorithm to closer match the true salmon numbers. To exemplify; if reviewing 10% of the data showed that all coho were wrongly identified as chinook, it will improve the estimates to make the same change to the 90% of the non-reviewed data for the final estimate. The real analysis is much more sophisticated but this illustrates the idea. In the near future these final results with a measure of confidence on the estimates can be inspected in a dashboard and downloaded as a spreadsheet file.