Multimodal Sensor Fusion for UXO Classification and Remediation

Objective

Despite the ubiquity of robotic systems in deep-ocean intervention, such approaches have limited impact in shallow-water unexploded ordnance (UXO) remediation, due in large part to the relatively crude and non-dextrous nature of the current state of the art in tele-operated manipulation. Computer-assisted or -controlled approaches offer great promise for addressing the fundamental issues in subsea tele-operation, allowing safe and effective execution of UXO remediation tasks; however, such computer assistance requires accurate digital models of the UXO in place on the seabed. While terrestrial research can rely on a variety of structured light and LiDAR-based sensors to generate such models in near realtime, no such turnkey solutions exist for subsea application, particularly for operation in the shallow, turbid waters where UXO remediation is of highest priority. This project examined the use of visible light stereo cameras and a high frequency forward-looking sonar, combined with platform motion, to construct and update three-dimensional (3D) reconstructions of UXO on the seafloor.

Technical Approach

This work encompassed four major tasks: (1) The construction of sensor platform containing stereo 4K cameras and a 2.1 megahertz imaging sonar, along with software allowing time-synchronized recording of data from all sensors. This system was mounted on a camera gantry which allows repeated, constrained motion approximating the close inspection of a UXO preceding and during manipulation. (2) Collection of data sets with the sensor capture system; including relevant metadata and the estimation of a ground truth world structure and camera trajectory. (3) Development of re-projection models between all sensors, particularly a procedure for using observed data to estimate the mechanical offset between the camera center and the origin of the sonar data. And finally, (4) the extension of Large-Scale Direct-Simultaneous Localization and Mapping (LSD-SLAM), a monocular SLAM algorithm, to meet the particulars of the described application, including making use of stereo for direct scale measurements, improving model convergence given relatively small camera motion, and inclusion of sonar data.

Interim Results

The improved LSD-SLAM algorithm is shown to produce a converged 3D model in realtime of a test scene using stereo video, including an estimate of camera trajectory. Outside of an undiagnosed scale error, this trajectory has a high degree of agreement with the independently measured ground truth trajectory for both camera position and attitude. An effective camera-to-sonar calibration procedure is also demonstrated, including preliminary results in projecting sonar data into the visual frame.

Benefits

This program developed hardware and software tools for gathering synchronized stereo video and imaging sonar data of objects, including estimation of ground truth scene structure and camera trajectory. It also showed the effectiveness of stereo visual approaches for 3D reconstruction in low-turbidity conditions, allowing continued progress toward the original application, assistive remotely operated vehicles manipulation, when those conditions are present. There remains significant further work to be done both in ensuring the visual reconstruction is robust and in making use of acoustic data either in supplement to, or in lieu of optical data.