Hey blog readers,
This week I got to get my hands a bit more dirty and really dive into the mechanics of data science and the data world of robotics.

I’ve been assigned a project which will help Sentry and other AUVs utilize machine learning, deep learning, and other advanced computer thinking techniques.
The problem statement and background is this:
Sentry cruises are structured in that there are a certain number of dives each cruise, anywhere between 5-15 is typical. Between each dive, when the vehicle is pulled shipside there are a number of checks and maintenance tasks performed on the vehicle. During these checks, experts will look over the data that Sentry retains from the dive. There is so much data, but the type that these experts are after is the engineering data and the sensor data. Engineering data reports upon things like thruster performance and battery health during the dive, whereas sensor data is the juicy stuff: it’s what the scientists are after, and what the whole point of the dive is. The thing is, the sensor data is helpful for us engineers as well. Experts can comb over the sensor data and notice if sensors are calibrated incorrectly, or if something is going wrong on a hardware or software level. The engineers then adjust the hardware and software accordingly, and this process happens each dive.
The issue is, some errors are sneaky, and not detectable even to the trained eye until much later in the post processing process, after the cruise has ended. This ends up causing big problems, as it can even render scientific data unusable or “dirty”. So the long term goal of the project I am working on is to automate this process, and be able to catch any errors or data discrepancies in the field, and be able to adjust accordingly so valuable data isn’t lost.

The specific project I am working on to meet that goal is about assembling a machine learning pipeline to automate that process and to even learn new things about the data in the process. The part of the pipeline “under construction” right now is the data staging area. We have tens of terabytes of archived dive data, from just Sentry dives alone. The question is, how do we successfully manage and organize this data, and using what platforms and tools, so that it is optimized to make Sentry a smarter robot? So that the solution is optimized to be used for training and deploying models, that can greatly improve the performance of not just Sentry, but other vehicles in due time.
Anyways, that’s a bit of what I’m working on! Right now it is mostly in the design and planning stages, but like I said, I’m getting my hands “dirty” with a bit of code!
Catch you later,
Steph