Monday, June 10, 2013

Design of Efficito's PostgreSQL-centric automation environment

One of the challenges of setting up a cloud hosting environment is ensuring that systems are as configured and that the desired state is one which can be rebuilt if there is a problem.  We at Efficito focus on data backups and rebuilding software images to consistent states rather than backing up full images and restoring them.  This helps ensure that in the event of a disaster we can ensure that VMs are restored to a consistent software state with data restored from backup.

Our choice on architecture was guided by the following requirements:

  1. Configuration and building of virtual machines should be subject to automation without human intervention, with full integration of payment frameworks and the like.
  2. Configuration and building of virtual machines should be such that virtual machines can be fully rebuilt in the event of disaster recovery requirements.
  3. Configuration changes should be able to be automated and retriable.
  4. This can be specific to hosted clouds for specific applications (we only host LedgerSMB as an ERP solution).
Further posts will probably cover very small pieces of our system.  The entire system cannot be published here in part because we want to preserve our trade secrets.  This post however just covers some ways we use PostgreSQL as the centerpiece of this environment.

The Basic Structure and Role of PostgreSQL


Our approach is relatively simple.  Data comes in through either an administrative or customer portal, transmitted to a limited API which then goes into our configuration database.  Information can be requests for new virtual machines, configuration changes and the like.  Additionally payment notifications can come in through these interfaces as well.

PostgreSQL is then attached to the configuration system which picks up notifications of needed configuration changes and orchestrates these on the system.  This also allows us to pull information on our service deployments into our financial system for billing purposes (we use LedgerSMB beta versions of 1.4 internally, eating our own dogfood so to speak).

In this regard PostgreSQL acts as an information backplane.  It allows our software to talk to eachother and allows messages to be sent between components with both transitory and permanent information being recorded in the database for later record-keeping (transitory information can be periodically truncated).

Information Flow


The system is still developing with various components coming together.  Nonetheless, the idea is that the customer or administrator enters information into a front-end tool, which, through a limited API, inserts the data into the database.

On database commit, triggers are fired which queue a message for reading by the configuration system.  We use pg_message_queue for this, in part because it supports both NOTIFY and periodic polling and we intend to add much better multiple listener support as we need it.

From there the listener gets information about what portions of the system need to be changed, makes the changes, and on success commits the transaction that dequeued the notification.  On a failure, a system alert is raised  and the system goes on to the next request (the item returned back to the queue for later processing on the next polling cycle).

What this means is that to a large extent this is a hands-off system.  We provide configuration options, customers select them, and once every aspect is running, the customer can control the software configuration of their vm's within limits, but without root access.  We do offer root access but only if the customer is willing to set up the ssl key and certificate (we can't give root access if our wildcard cert is on the vm!).

Saturday, June 8, 2013

Tangent: Design thougths about next gen PKI in today's world

With the revelations of massive surveillance by the NSA on Americans, I have thought a bit about how to securely set up end to end cryptography in order to offer guarantees of security.  This is by no means limited to offering security relative to governments, but includes the fact that organized criminals can mount attacks similar to wiretaps using man in in the middle and other possible attack techniques.

Designing a perfectly secure system is probably not possible.  A determined enough attacker will be able to gain access to any communications given enough effort, resources, and determination.  The goals of the system I am describing however would be to maximize the resources required,  and thus force evesdroppers to focus on only the most valuable targets possible and causing as narrow compromises as possible.  Additionally metadata is to some extent impossible to protect to the same extent content is.

In general SSL is good enough for key negotiation etc. if the system can be made resistant (not perfectly secure) against man in the middle attacks and if authorities can be sufficiently trusted and validated over time.

Most key exchange approaches focus solely on minimizing risk at the moment of key exchange (i.e. reducing synchronic risk).  This approach is different in that it focuses on resistance to exposure over time (reducing diachronic risk) and seeking to provide as much notification of compromised communications as possible.

Such an approach will *not* protect people against government spying in jurisdictions where keys can be demanded via subpoena or even warrant.  However this approach seeks to force authorities to seek the keys from the people under surveillance and not rely on digital surveillance.

In general as much as I dislike SSL/TLS (as I dislike pretty much every attempt to port OSI protocols to TCP/IP) it is well developed and well understood and the security dimensions of the protocol are well documented.  Additionally unlike IPSec, it is appropriate for cases where the data may need to be routed and rerouted among different servers, possibly altering routing destinations.  It is therefore the approach I suggest building upon.

This design is not intended to be entirely anti-government.  Given the amount of cyber-espionage a framework may be of interest to governments trying to assure the security of their own communications.

Vulnerabilities of SSL


SSL has several known vulnerabilities in its design and architecture.  Many of these are the result of basic design tradeoffs in the protocol.  I think that the risk model has shifted (in favor of very large organized crime and surveillance by well funded governments including China and the United States) make the world very different.

The threat model that SSL is designed to address is where you are protecting a large number of low- to mid-value targets against relatively casual or trivial attacks.  As attacks by foreign and domestic governments and organized criminals have become more sophisticated, the basic structure of how we use SSL and TLS has not evolved at the same pace.

The two major vulnerabilities of SSL are:

  • Large, central certificate authorities present single points of attack, and
  • Man in the middle attacks are possible provided that there are no expectations about certificate authorities on both sides.
My proposal below focuses on broadening the threat type against which security is provable.  This is further not inconsistent with existing approaches regarding central certificate authorities, but adds a layer of diachronic protection as well.

I do not believe this is particularly useful for criminal organizations and it poses problems when moving down to the individual level (these can be solved however).  Nonetheless it should give corporations and governments and even individuals an ability to ensure that their communications are not being unlawfully reviewed without their knowledge.

Solution to Third Party CA Vulnerability


Third party CA's are a current vulnerability of the SSL system.  If keys are registered or obtained through a third party, their processes or systems can be attacked to get bogus certificates or keys retrieved sufficient to forge certificates.  Third party CA's thus have to be incredibly secure.

The problem though is that no system is that secure.  When computer viruses have been planted in US drone operation centers, and spear phishing attacks have been successful against some of the most secure US government

My proposal is to divide trust between an external certificate authority and an internal certificate authority, both of which are tracked over time in the key negotiation process.  The external certificate authority's job remains to validate that the certificate is issued to the proper individual or organization (essentially an internet-based notary public), while the internal certificate authority is to identify individuals and services at an organization.  Because of the structure of the internet, and because of current practice, I would recommend keying this to the purchased domain name.

This by itself essentially means that the operational encryption keys are not certified by certificate authorities by rather by an internal tier.  In essence root certificate authorities would issue only certificates certifying things like "this domain is owned by who we issued the certificate to" while subdomains would be a local matter.

This does not interfere with provable security if the following rule is enforced:  Operational certificates for resources at a domain MUST be regarded provably secure ONLY IF they are issued by a certificate authority whose certificate indicates it was issued for the domain in question (and possibly transitively) by a trusted root authority.

Note that this means two things:  An attack on a root ca can no longer reveal keys useful in eavesdropping on key exchange, but it could reveal keys useful for carrying out a man in the middle attack.  It thus increases the effort modestly in carrying out eavesdropping on SSL-protected connections, but this is not much protection given the complexity of the attack needed to make it even an issue.  The real value comes from looking at diachronic protection against man in the middle and here is where the division really comes in valuable.

This is not limited to three levels, but it is worth noting that it would be necessary to check the certificates at each level to make sure that the above resource chain was not broken.  A fourth level might be necessary in scaling down to individual consumers (who typically do not own the domains they send emails from).


Diachronic Protection against the Man in the Middle


The approach mentioned above is chiefly useful because it allows for one to track certificate authorities' and their keys over time relative to a domain.  A sudden change could indicate a man in the middle, and with mutual authentication both sides should get alarms.

I would propose extending the new certificate to require signing with the previous key as well as the current one.  A revoked and re-issued certificate would then be signed both by the certificate authority (as evidence of continuity) and by the parent certificate authority.

This means that you have strong evidence that the certificate was not only issued to someone the certificate authority is apparently willing to vouch for, but also that it was received by the holder of the private key of the previous certificate.  Now in the event that the key is fundamentally lost, and a certificate re-issued, the holder would probably want to say something.

Now, this establishes a timeline of key changes which can be tracked, and what this means is that as keys are issued, timelines diverge.  The key issued by the root CA is no longer sufficient to establish new connections without warning of a diverged timeline, meaning that connections with previously unknown parties can be evesdropped on but those who have been in previous contact will detect that something is wrong and hopefully alert their users to a possible problem.  This provides both sides an opportunity to avoid problems.

Of course an alert may just mean massive loss of information and the previous key was lost so it does not necessarily mean a man in the middle.  However unexplained ones do indicate a problem.

Active Man in the Middle Detection


Additionally this structure should enable us to do man in the middle detection for any real-time bidirectional communication.  This depends on a certificate cache for effectiveness and this allows diachronic tracking of certificates for previously known resources.  However for new connections there is a problem.

One solution is to orchestrate several additional requests for certificates from several other resources one knows about with the legitimate request occurring at a random spot.  The observer in the middle cannot determine which is the legitimate request, and so at first will not, without extensive review first, be able to even guess which ones are unknown.   This would make it particularly  difficult to eavesdrop on volumes of communications.

Additionally once one contact pair is eavesdropped on, removing the eavesdrop triggers a diachronic protection warning.

Scaling Down to the Personal


The big thing that is required for scaling this down is to recognize that an additional tier is required, because individuals often send emails through the domains of their ISP's or email providers.  In this view one would get a personal CA certificate, which would then issue one certificate for each of things like web access, email, etc.

Extensions to X509 Required


The fundamental extension required would be a way of having all relevant internal CA certificates serially in a single format, up to the root CA which could be independently verified.  Additionally there may be some others.  A reasonable overlap time may need to be specified for certificate authorities transitioning certificates to a new certificate.  The determination of reasonable policies for such transition periods is beyond the scope of this proposal, however a temporary change (before reverting to the old cert) would be definitely suspicious.

Additionally, one would need to have a Key Version List, where keys could be listed in sequence for a period of time.  This may need to be added to the certificate structure as an extension.

Limitations


Security here is provable over time only given the following assumptions:

1.  The private key has not been compromised on either end.
2.  Continuity in changes regarding private keys is known and can be shown.

The reliance on unconditional trust of root certificate authorities is reduced, though some reliance is necessary, and the effort needed to mount an attack would be higher.  However the above limitations mean that false positives for security concerns may occur in the case where keys are lost and certificates re-issued, and false negatives in the event where private keys are compromised.

In the event where authorities (in jurisdiction which allow this) subpoena a private key, they can eavesdrop on connections.  Similarly spearphishing could be used to obtain keys by organized crime.  Thus these things are outside the threat model protects against.

However the limitations are narrower and help reduce the risk that certificate authorities face, as well as enabling people to better protect their communications.

Tuesday, June 4, 2013

New ventures and new directions for this blog

I have agreed to help found a LedgerSMB hosting business called Efficito.  We have a very basic web page up (which is in the process of further development) but we intend to offer hosting specifically of LedgerSMB 1.3 and higher to customers who want it.  If there is interest, please feel free to email me.

One thing we have committed to do is to leverage PostgreSQL for the overall management of our hosting services.  We intend to use the database as a point of original entry, allowing other processes to enact changes saved in the database.  This is something very different from a standard 3-Tier architecture in that the entry of data into the database triggers real-world side-effects in a distributed environment.

Obviously I cannot disclose our full codebase to the world, but I expect to be covering a number of aspects of how we will use PostgreSQL, and include small snippets of past or current versions of the code in these posts.  The examples will be naturally minimalistic and lack the context of the full system, but they should be interesting from a question of how one works with PostgreSQL in an automated environment and a number of other topics.

In essence what started off as a blog about LedgerSMB, which has expanded to include object-relational design generally, will now be expanded yet again to include database-centric automation.

I expect to cover  a few pieces in the near future including cross-type referential integrity where inet addresses match cidr blocks, and the use of ip4r to construct exclusion constraints for network ranges in table types.  The first post is probably more generally useful because understanding how to manually do referential integrity can help solve broader problems, like referential integrity over inheritance trees.

Anyway I hope you enjoy some of the newer areas of focus as well.