Tuesday 26 May 2009

How do you model information flow?

On the subject of secure information flow in computer programs, one of the questions that one has to answer is whether a program, which has access to confidential data, is free of unwanted information release. This leads to another question of what is information release or information flow?

The classic technique used in this field is to specify lack of information flow as a noninterference property. Noninterference intuitively means that confidential input does not affect public outputs of a program. Thus a program which satisfies this property is free of unwanted information flow. This kind of "lack of information flow" policy is suitable for military multi-level security systems, in the sense of the famous Bell-Lapadula "no-read-up" property, where information must not flow from a higher security classification to a lower one. However, in practice, many programs have to reveal some level of information as part of their functionality. For example, authentication programs, which reveal some information about the password - if authentication fails based on a guess by the attacker, then the attacker learns what the password is not. Similarly, statistical software do reveal, in the statistical result, some information about the data used in the analysis. Even encryption algorithms reveal some information about the secret keys and messages used. This is just to mention a few of the everyday applications where the noninterference approach fails.

So, how do we model more generally the notion of what information flows, as a step towards specifying what is the level of information that we actually intend to release, which is a statement of our security policy? One of the properties of information that is being alluded to here is the notion of information ordering. That is, the idea that one information is greater or more informative than another one. This simple idea provides us with a basic but a quite general way to specify the notion of secure information flow, whereby we say that the release of information by a program is safe or secure if and only if the level of information that the program releases is smaller than what we permit it to release. This statement relies only on the information ordering property as the basis of the security enforcement.

For this purpose, we can view information as elements of a partially ordered set, where the partial order captures the notion of degree of informativeness. So, our objective is to see how programs transform an attackers knowledge within this set and to check whether such a transformation is permitted.

We say that information flows when the attacker's knowledge is transformed from one level of information to a higher level. In this view, noninterference is a special case, which corresponds to the situation when the attacker's knowledge remains the same even after observing the result of the program that is processing sensitive data. More generally, we can accommodate situations when the observer is allowed to gain some specified level of information. It is the job of our information flow policy to specify this acceptable level, and our enforcement mechanism must ensure that programs cannot violate this policy by releasing more than the policy allows.

A recent work of mine (LNCS paper titled: A Policy Model for Secure Information Flow) looks into this problem, starting from a model of information, which is based on complete lattices (a special kind of partially-ordered set). Then techniques to extract the information released by a program from the operational semantics, under a given attacker model, is then presented. Finally, the lattice-based policies are used to enforce secure information flows. The ideas in this paper are quite simple and I hope to elaborate one some of these to the non-specialists, who might be interested in this very interesting subject.

In future editions, under this topic, I will explain how information flow may be modeled qualitatively and quantitatively, and how this fits into the lattice model mentioned earlier.

Language-based Security

Being a computer security researcher, I am interested in various aspects of computer security. A particular goal I have is to research, and if possible, contribute to the development of techniques whereby we may protect computers automatically through enforceable high-level security policies.

One specific area of interest is in ensuring that programs don't violate local usage policies. For example, an accounting software, or say a tax return calculator, which must necessarily be granted access to confidential information, should not reveal information that we do not intend to release. What is to prevent such a program from encrypting all the financial data and sending the result to an unintended recipient?

This is where Language-based Security can be of use. Although much work still needs to be done with respect to writing high-level security policies that can intuitively capture our intended information release policies and which can automatically enforce the policies at some appropriate level (e.g. inside the operating system, or a virtual machine, or web browser), but this field has promise.

There are of course many aspects to this problem and there are many interesting techniques already developed which have merits and some obstacles. Needless to say that this is a generally difficult problem. I hope to be documenting some of my thoughts and experience in this area and on the general topic of computer security on this blog.