2018-03-17

Just another Schiaparelli

Schiaparelli crater on Mars By NASA - http://ltpwww.gsfc.nasa.gov/tharsis/regional.html, Public Domain

In the inquiry of the Schiaparelli anomaly, p. 12:

Because of the error in the estimated attitude that occurred at parachute inflation, the GNC Software projected the RDA range measurements with an erroneous off-vertical angle and deduced a negative altitude (cosinus of angles > 90 degrees are negative). There was no check on board of the plausibility of this altitude calculation.

Then, p. 22 (some emphasis added):

Not robust decision logic in the GNC S/W

The sanity checks implemented were focused on the RDA and the RDA – GNC altitude estimates (IMU based) in which the GNC attitude estimate was considered perfect. […] Essential variables as attitude and altitude were not monitored, though there were means for detection of major problems, […] No recovery strategy was defined for degraded case. In fact, the on-board software correctly detected an inconsistency between radar altimeter and IMU measurements but was instructed to mix inconsistent information (slant range and attitude).

Hence the Reccomendation 05:

Robust and reliable sanity checks shall be implemented in the on-board S/W to increase the robustness of the design, which could be, but not limited to:

[…]

  • Check on altitude sign (altitude cannot be negative)

Some online pages speculate a code like this (must be considered pseudocode):

   if altitude < 3000 then
     --landing procedure--
   end if;

Of course if altitude is less than zero, the condition holds, then the landing procedure code is executed. Before reaching such an if, the altitude should have been recognized as wrong and dealt with someway.

Check on altitude sign

In Ada you can use a subtype which actually makes it impossible a variable assumes an unacceptable value without you can notice. Implementing the raccomentation 05 means to use these Ada features.

The following code will raise the Constraint_Error because the cosine of the Off_Vertical_Angle is negative. Two fictitious functions give the value for the “distance” (from RDA) and the needed angle for its projection (from the IMU).

-- ... this is supposed to be part of a pkg specs
Max_Altitude : constant Float := A_Reasonable_Value;
Max_Distance : constant Float := A_Reasonable_Distance;
subtype Alt_Type is Float range 0.0 .. Max_Altitude;
subtype Distance_Type is Float range 0.0 .. Max_Distance;
subtype Angle_Type is Float range 0.0 .. 360.0;

-- ... this is supposed to be part of the pkg body
function Compute_Altitude return Alt_Type is

   function Get_Distance_From_RDA return Distance_Type is (3000.0);
   function Get_Angle_From_IMU return Angle_Type is (98.3); 

   RDA_Distance : Distance_Type := Get_Distance_From_RDA;
   Off_Vertical_Angle : Angle_Type := Get_Angle_From_IMU;
begin
   return RDA_Distance * Cos (Off_Vertical_Angle, 360.0); -- raises
end Altitude;

What if the simulation doesn't simulate the case

Now, let us suppose this kind of constrained type were used. It isn't enough: the code also need to catch the exception and to cope with it so to fix the situation.

begin
   Altitude := Compute_Altitude;
exception
   when Constraint_Error =>
      Altitude := Guess_Altitude (Time_Profile, Expected_Altitudes_Table);
end;

(Let us suppose this is plausible.)

If you've written such a code, it must be because you (or someone else) have estimated that the probability of the event makes it worth to handle it somehow. And if you've written it, you're also going to test it and/or to simulate the case.

But there isn't such a code, apparently. That is, they did this strong assumption: those (unconstrained) variables will be always in their meaningful working range1.

They decided that there wasn't the need to check a scenario where the off-vertical angle is greater than 90 neither for real nor from the code point of view (because of some kind of error), despite this angle being a critical variable which is used to compute an even more critical value (the altitude).

Un-asked questions

It seems that out-of-range values were possible for a short amount of time. How long is a short amount of time, this must be defined properly, and clearly it wasn't.

the parachute inflation triggered some oscillations of Schiaparelli at a frequency of approximately 2.5 Hz […] the IMU measured a pitch angular rate […] larger than expected2 [and] raised a saturation flag. During the period the IMU saturation flag was set, the GNC Software integrated an angular rate assumed to be equal to the saturation threshold rate3. The integration of this constant angular rate, during which the EDM was in reality oscillating, led to an error in the GNC estimated attitude of the EDM of about 165 degrees. […]

Once the RDA is on, RIL [Radar In the Loop] mode, “consistency checks” between IMU and RDA measurements are performed. The parameters checked are: delta velocity and delta altitude. The altitude is obtained using the GNC estimated attitude to project the RDA slant ranges on the vertical. Because of the error in the estimated attitude […] [see the beginning of this article]

Consequently the “consistency check” failed for more than 5 sec4. […]

[…] estimation of the altitude was negative and very big.

No checks on angles which can be “dangerous”, no checks on the final estimated altitude, no handling of a consistency check when it fails for more than the decided (how?) threshold5, integration of a constant angular rate (assumed to be equal to the saturation threshold rate) without an upper bound6


  1. I believe this is a really odd posit for values which come from instrumentation: it is like trusting user input. But I am nowhere near to understanding all the many complex “processes” which bring from nothing to something like Schiaparelli: it's a lot easier and obvious when you analyse things after.

  2. Why wasn't it expected so large, and why didn't a value “larger than expected” trigger a code handling the case?

  3. This logic smells, but again… it is easy to say after the bads happened.

  4. This must be the short amount of time.

  5. Should we consider the case when the check fails for more than the threshold? - No, it can't happen. - Ok, let's assume it won't happen… Of course if there isn't anything you can do to recover from such a situation, then the mission is doomed whether or not they handle these cases: simply this must not happen, or we'll lose it.

  6. Ok, so we are going to assume a constant angular rate until we get recovered. (Is the saturation threshold a hardwired constant?) What does it happen if we keep integrating for more than N seconds? No way it'll happen and I did the math, we can afford this for about 5 seconds, so let us assume the saturation flag won't be set for longer than this

No comments:

Post a Comment