Changing input validation in an established API

Let's suppose that your server S exposes an API and that this API is used by many clients, or maybe just one — it doesn't really matter.

Let's focus on a client only, let's call it C. Documents were exchanged between S and C, so to agree on the interface.

After that, tests were done and none raised issues or, if they did, they were fixed.

The tests stated that S and C can talk, that they can understand each other, that C calls S correctly, that S acts properly according to the request, that C parses server's answer correctly, and so on.

Everything works fine, or mostly fine, since the beginning.

One day, the Security Team attacks your server's API and discovers some security problem. They send you a document with their findings and a request to fix those security problems.

In order to comply with the request, you add few missing constraints check in the input validation. These constraints fix your security problems; moreover the new checks have a match in the interface agreement document, so that you haven't to worry about your clients.

You think.

But indeed those changes broke the client C: now C calls the API and the answer from the server is an error because of one unfulfilled constraint. Everything stops working as it did.

Don't do that

A naive reasoning could blame the client C and say that the burden of the fix must be taken by it — after all, the client is using a parameter which does not respect the constraints given in the interface agreement document.

I firmly disagree with that.

As stated before, tests were done, and no problems were found by both sides.

If you need to change something because of security concerns, you must either inform all the clients of your changes and re-run all tests, or not change the observable behaviour for the API according to the tests already done and “approved”.

In the second case, it would be a good idea also to update the documents to state correctly and clearly the checks which are really implemented. Clients must be made aware of these changes, of course.

To state clearly the actual constraints and checks from the beginning, and to keep the documents up-to-date (and to distribute them to all the interested parties) is very important.


These thoughts are inspired by a real story.

In the real story, nobody explicitly said that the client C should fix how it calls the API; and in fact the burden was taken by the server. But the API documents were sent back, as to imply that the client C was misusing the API.

As said, tests were done and the server did not complain for any of the calls made by the client1. Once the tests were accepted by both sides, it's the server's responsability to maintain the compatibility with the client using their services (if they don't want to re-run, altogether with the client's side, all the tests).

This is general, but in the real story there are other oddities in favor of the client2, even though, if we want to trace back to the source of the problem, we will see that it wasn't a server S side superficiality (alone).

First, the client C used already a similar API with another server (let's call it S0). A requirement for the server S was to have the very same API as S0, plus few extra things because of the specific functions provided by S. That is, the client C wanted to use S in the same way it did with S0 for all the common requests3.

Secondly, because the client C didn't change anything for the requests, it happened to send to S a parameter, let's call it P1, which is mandatory for S0, but it is meaningless for S. Then, S had to ignore it, at least internally, but it had to accept it in the interface. The violated constraint was the one of this parameter P1!

Let's try to understand what happened.

The well established communication between C and S0 was the result of several steps and changes while S0 adapted to the needs of the projects of its client (the company, not the system — client X owns the software client C). In the API document of S0 for the parameter P1 it's still possible to read this4:

parameter P1 ... mandatory (N1 chars)

For C, P1 was always shorter than N1. Since for a similar text (say, N chars) in other parameters S0 required padding5, C padded P1 too. Until the day there was an agreement to use the actual value of P1, without padding — N1 became just the maximum length.6

Then, S0 was used again in another project where P1 had no meaning, but it was still a mandatory parameter in the interface7. It was decided to fill P1 with the same value of another parameter, P2. For P2, S0 required a length of N28 (N2 < N1), with padding to reach it in case the actual value in C were shorter.

Summing up, in a project P1 is a variable length mandatory parameter (length limit: N1). In a second project, the same; but it's value is copied from P2, just because of the mandatory bit, and because of this its length is always N2 (P2 is padded upto length N2).

In all this, the document remained unchanged (or new documents weren't distributed to all involved teams), and unclear in its specification since the beginning: does “N1 chars” mean “exactly N1 chars” or “at most N1 chars”?

To state clearly the actual constraints and checks from the beginning, and to keep the documents up-to-date (and to distribute them to all the interested parties) is very important.

S inherited this document and its description for the parameter P1. Since it was known since the beginning that P1 has no meaning for them (as for S0 in the second project), then they decided to ignore it. Correct.

Then, the Security Team arrives. Thinking about it… What kind of security problem requires checks on the length of a parameter that should be ignored?

I can think buffer overflow when reading it; or an uncaught exception: S does not use the value, but this is stored in a database in a column varchar2(N1) (or alike); the Security Team tried to pass a P1 longer than N1 and no code handled the exception, which was raised and exposed some internal machinery of the server and/or made it stop working9.

Anyhow, the people at S decided to add the checks also on the to-be-ignored parameter P1. What checks? Let's take a look at the document: here it says mandatory, exactly N1 chars. According to the API document of S0 I have, the exactly was an addition of theirs, perhaps to remove the ambiguity of the original document… good idea — or, worse, maybe it was so in another revision of the API document of S0… not up-to-date either.

C knew P1 wasn't a fixed length string, C informed S anyway that P1 had the same value of P2, at least implicitly (because it was so for S0 in the second project)10 — meaning, they are the very same string. Nevertheless, correctly the developer of S went to the shared document. And did the wrong thing.

Not only because a server must not change the observable behaviour tested with its clients, but also because you don't enforce an exact constraints on a parameter that, if not passed, it's not a problem.

Said differently, because that parameter is a total waste of bytes that could be spared and that you can ignore, then you can — and maybe shouldsilently relax the mandatory thing, and avoid checking the exactly N1 chars, or any other constraints, even if it is in the original interface.11

Of course, you must put a limit at its length, a maximum. If you are going to parse a string given as input, you always put length limits, on the whole string as well as on “meaningful segments” of it. This limit can't be less than N1, of course; but it could be equal or greater, at your choice.

  1. Also, examples of parameters were sent back and forth, and nobody raised the flag on them not satisfying the constraints.

  2. I will keep this as detailed as possible but also as abstract as needed not to reveal anything too specific to the real projects involved. C and S are indeed part of the same “area”, so to speak. S0 is and was really a third party server.

  3. It wanted to use the very same code developed for S0. This made the project cheaper.

  4. I am referring to the latest document I've read. If there's a more up-to-date document, this wasn't given to the developers' team, nor to the developers of the server S. This shows a problem which I believe can be not so uncommon in companies where the higher ranks (so to speak) can't intermediate properly between different consultancy suppliers and services providers. Sometimes I have the impression that they try to “protect” the services providers (which of course don't want to disclose details of their services to the world) by giving the consultants (who are third parties) as little information as they believe is necessary… The they doesn't refer to programmers or alike. Or maybe the problem is something else, it doesn't matter. Fact is, too many corrections are held only in e-mails and don't seem to be documented where they should. Sometimes e-mails are the only proof you did well according to informations that made a document obsolete.

  5. The story as I can remember it and as I deduced despite the many unsaid things, is a little bit more complicated. There are parameters C passed to S0 that weren't for S0 (only), indeed, but for other systems: S0 seemed to act as a passthrough for those parameters (if not for almost all of them). So, how those parameters had to be, it depended on those other systems. If I recall well, it seemed that S0 behaved just like a “dumb storage” for those values: in tests C↔S0 only, C could use silly and “impossible” values of different length, and it worked (given the purpose of the test); in E2E tests it didn't, of course, because those parameters had to be meaningful either in order to allow S0 to communicate correctly with other systems, or in order to pass those systems correct data, or both.

  6. P1 was some sort of identification label to be defined for the specific project; I believe the length limit N was chosen according other systems' specification. (Imagine this conversation: X — we have this id we need to…; S0 — ok, no problem, tell us its maximum length so we can define the column size of the db…; X — well, we have N1 on our db, it will be enough…; S0 — ok, N1.)

  7. I don't know why, but I suppose it was for the same reason it was decided S should implement the same interface as S0: costs.

  8. P2 is a parameter which has meaning in several systems. In none of these it is a fixed length string. In all of these it has maximum length N2. So, I think there was a misunderstanding communicating this information to S0.

  9. I don't know the technology the server S runs on; if it's Java, the buffer overflow isn't a possibility.

  10. To prove that, you need to dig into hundreds of emails, hoping that it wasn't said in a meeting nobody did the minute of (or someone did, but forget to mention the detail).

  11. There's no a direct contradiction with the “rule” of not changing the observable behaviour if the accepted input domain is bigger (besides, on a parameter which is indeed to be ignored).

No comments:

Post a Comment