Tuesday, October 18, 2011

The Design of LedgerSMB 1.3's Authentication/Authorization/Auditing System

LedgerSMB 1.3 and higher uses database roles for group/user permissions management.  Every user is a database user, and permissions are managed by roles which are granted permissions on a functional requirements basis to underlying database objects (tables and functions).

In doing so, we can take advantage of a number of pieces of advanced PostgreSQL functionality to provide a much richer and more secure application environment.  At the same time there are limitations with this system which must be taken into account as well.

Basic Design Requirements

The LedgerSMB Authentication, Authorization, and Auditing system was designed to offer real security even as the older codebase inherited from SQL-Ledger did not admit of any real security controls, as well as offering an extensible set of authentication options, and an ability to flexibly and consistently apply security permissions across several databases in a cluster.

The basic use cases are as follows:

  • A single business will only have one database.  The system shouldn't be unmanageable for such a business.
  • An accountant/bookkeeper might access books for several businesses, and would need consistent access across databases.
  • A midsize business with several subsidiaries might need some people to have access across databases, and others to have permission only to specific databases.  Such a business might also need some sort of single sign-on technology, or even integrated authentication.

Overview of Design

The current design uses PostgreSQL roles for managing permissions.  On the most basic level every user is a database user.  Roles with the nologin attribute is set are used to assign permissions to underlying relations and functions.  Generally, the simplest permissions are favored over the most complex.  This means that operations that correspond exactly with relational operations are granted relational permissions, while those that do not are granted functional permissions.

Some of these roles are granted to other no-login roles to serve as larger aggregations of permissions.  Users may be granted either sets of roles.  A set of administrative functions are provided by the software to manage user accounts, permissions, and reset passwords (if applicable).

Obstacle 1:  Roles are global to a cluster

This is both an obstacle and a useful tool.   This allows users to be consistent across databases, but also poses a challenge regarding the larger database clusters because the roles which grant specific permissions end up being cluster-global as well.

We resolved this by making the db-specific roles prefixed with lsmb_[database_name]__  (note the double underscore after the database name).


Obstacle 2:  Authentication methods

Authentication necessarily requires a few specific pieces of information to be passed to the database server, and these must be obtained from the user.  The first is a username, and the second is some sort of authentication token.  Any intermediary framework must then be able to pass these credentials on to the database in such a way as to properly log in.   This can consist of three basic types:

  1. Password based authentication, where the authentication is a memorized password, typed into a terminal,
  2. Token-based authentication, where the token is supplied from an external source, and
  3. Two factor authentication, where the both of the above methods are required.
There are no direct two factor authentication methods I know of that are supported by PostgreSQL, and I don't know if the PostgreSQL protocol would even support such a method.  It could be simulated, however, in a customized LedgerSMB system if necessary.

The other two break down into two questions:
  1. What can the web server pass on to the database server? and
  2. What methods can the database server use to authenticate the user?
As a result of considering these, several authentication options cannot be used either on the web server or the database server.

On the web server, for password authentication, the re-use requirement means that the only possible method for authentication is HTTP-Basic.

On the database server, client-side SSL certificates can only be used to authenticate the web server to the database server.  They cannot be used to authenticate users, and so are a not a possible choice for database authentication for a web app of this sort.
 
Security Tradeoffs

Security, it could be said, is an endless series of tradeoffs.  These tradeoffs not only include usability vs security, but also how mitigation of concerns are prioritized.

This approach prioritizes mitigating SQL injection and privilege escalation issues in the web application above possible flaws in the PostgreSQL authentication mechanisms and misconfiguration of the web server.

We believe that there is general benefit in pushing authentication back to methods which are better tested and have more peer review.  The PostgreSQL community has been, in our experience, extremely professional regarding security, further raising our confidence.  So this first trade-off is easily justifiable.

The second tradeoff is a little harder to justify at the outset.  If passwords must be re-used by the web server, zero-knowledge authentication systems cannot work.  This means the password must be passed in clear text to the web server.  This leads to several kinds of attacks, including  malicious admins logging passwords, use of the web server without SSL allowing passwords to be overheard by an eavesdropper.  We address the second type of attack by requiring SSL in the default configuration of LedgerSMB.

The issues with malicious system administrators can never be completely mitigated using password authentication.  If this is a consideration, it's best to look to something like Kerberos (which could be supported on 1.3 with a little bit of effort).  With Kerberos support, the web application, the database, and the user can all maintain a healthy distrust of eachother.

Other Implementation Details

A few other basic things are worth mentioning here.

The first is that PostgreSQL allows passwords to expire.  We make use of this to differentiate between user-set passwords (valid for a configurable length of time) and administrative set password (valid for one day).  Users can reset their own passwords.

Administrators of the application can issue temporary password resets.  The way this works is that the administator makes up a password for the user, and the user logs in, and is prompted to change it.

Thursday, October 13, 2011

SODA in LedgerSMB 1.3: A Critical Perspective


Following my Introduction to SODA post, I figured it would be a good idea to discuss the implementation of this on LedgerSMB 1.3, as well as a critical perspective as to how it has not only delivered benefits but also fallen short of our goals, and what will be done to remedy this situation.  Future follow-up posts will flesh out how we will implement this in the database, as well as reference mappers in Perl, as well as possibly other languages (PHP and Python are being considered).

Quick Review

SODA, or Service Oriented Database Architecture, is the approach the LedgerSMB project towards bridging the object oriented vs relational paradigms.  Unlike ORM systems, which design the programming objects first and then map these to a relational model either automatically (by creating tables) or manually (by defining mappings), SODA makes the relational design the primary one and then maps the relations out to interface types, which eventually will become semantically meaningful enough to define object-oriented interfaces to the underlying relational operations.

SODA is based on the idea that the database is the center of the processing environment, not merely the bottom tier of a specific application.  If it is a bottom tier, it is a bottom tier that many applications may use in common, and where consistency and security may need to be enforced across applications.  The database must then not only store, retrieve, and manipulate data, but must offer consistent interfaces for doing this.  These interfaces should be self-documenting to the point that a code generator can pick up on them  and create wrapper classes.  Additionally the database should be designed such that multiple programs can coordinate with eachother though the database.

LedgerSMB is closely tied to the PostgreSQL RDBMS, which we believe provides an ideal feature set for addressing these concerns.  PostgreSQL is one of the most extensible and powerful database management systems anywhere, it is open source, and it performs well under the workloads we care about.

The inspiration for SODA has come in part from the idea of RESTful web services and the emphasis on semantic clarity and re-use of existing frameworks to achieve loosely coupled, interactive systems.


PostgreSQL features currently used

As of LedgerSMB 1.3, stored procedures and to a lesser extent custom types play a central role in our approach.  Simple stored procedures are written in SQL, while more complex ones benefit from the semantic sugar and control structures of PL/PGSQL.   For those who haven't played with PL/PGSQL, it's PostgreSQL's answer to Oracle's PL/SQL, both being essentially PL/1 programming structures with integrated SQL query execution.  The language is clear, concise, and powerful enough for virtually all database access, and it performs quite well under virtually all workloads.

Because LedgerSMB 1.3 is the first branch to use this approach, many concepts are necessarily underdeveloped.  Additionally, LedgerSMB 1.3 was originally developed with PostgreSQL 8.1 in mind-- we started when PostgreSQL 8.3 had just been released.  Now we support 8.2 and higher.  This means that many of the advanced features of composite types, such as arrays of them, are unavailable in the current version, necessitating sometimes ugly and very un-SODA-like workarounds.

Additionally, the LISTEN/NOTIFY framework is used in some cases, sometimes with queue tables, as a transactional form of inter-process communication, but currently since LedgerSMB is usually installed as a single application, there are few opportunities to make use of this within the software distribution itself.


Implementation on LedgerSMB 1.3

SODA in 1.3 is very stored-procedure centric.  Stored procedures are defined.  Their arguments are named for object properties, prefixed with in_ to avoid collisions with columns.  The stored procedure names themselves are supposed to be sufficiently descriptive to allow them to be bundled together into classes.  In particular, the format is supposed to be the class name followed by a double underscore, followed by the method name.  In practice this is very hard to maintain because double and single underscores are not very visually distinct from eachother.   At the same time it isn't clear there is an alternative regarding naming of stored procedures.

On the Perl side, the magic occurs in two modules which contain the only SQL code in new modules for LedgerSMB 1.3:  LedgerSMB and LedgerSMB::DBObject.  LedgerSMB.pm's method "call_procedure" provides a low-level query generation for a named stored procedure along with enumerated arguments, while LedgerSMB::DBObject's "exec_method" provides higher-level mapping for stored procedures arguments to stored procedure mapping.   These provide methods of adding ORDER BY clauses to queries but few other functions at the moment.

These methods return lists of hashrefs, and handling these is the application's job.  One useful function is LedgerSMB->merge() which merges the properties of a hashref into the current object. 

The actual working classes in perl inherit these methods from DBObject and this allows for extremely simple perl wrappers for the stored procedure.  A typical wrapper may contain only a few lines of code, like this:

=item get_asset_class()

Requires id to be set.

Sets all other standard properties if the record is found.

=cut

sub get_asset_class {
    my ($self) = @_;
    my ($ref) = $self->exec_method(funcname => 'asset_class__get');
    $self->merge($ref);
}

The inputs then come from the web interface and get passed to the stored procedures quickly passed in.  In other words, the stored procedures define the application and everything else is written around these stored procedures.  

Benefits Realized

As mentioned in my previous post,  This framework in LedgerSMB 1.3 has allowed us to separate relatively agile development on the front-end from heavy engineering on the back end.  This has allowed us to re-engineer a surprising subset of the application remarkably quickly for customers requiring significant scalability.  In general, it is far easier to troubleshoot and correct both misbehavior and poor performance with hand-tuned SQL and PL/PGSQL stored procedures than it is to troubleshoot the queries which ORM's may generate.


Improvement possible

There are many improvements that can be made to our current implementation.   Although the framework so far has resulted many benefits these are not as great as they could be or what is needed as we get deeper into the re-engineering of the financial portions of the application.  Many of these limitations are due to limitations in older PostgreSQL versions which can be solved by taking advantage of newer features in 8.4 and higher.  Other limitations are problems with approach which are solved by changing our coding conventions.  The development of this approach will likely change over time. 

SODA has on the whole been very helpful but has also fallen well short of what we have hoped it would do.  It is by no means close enough to use to generate code in a variety of languages, and the interfaces are not as self-documenting or as manageable as we would like.

Shortcoming 1:  Complex data handling



The first major shortcoming in LedgerSMB 1.3's design is that PostgreSQL, prior to 8.4, has been unable to handle arrays of complex data types,  This means that complex types are relatively flat, and they lack the ability to express the sorts of data structures that object oriented programmers expect  Starting with 8.4, however, this sort of expressiveness is built into the database back-end.


In 1.3 we employ two strategies for addressing this issue.  The first is to use two dimensional arrays of the lowest denominator type (often TEXT) to simulate arrays of tuples.  This however breaks discoverability because the programmer (and hence the software) has to have intimate knowledge of the internal structure of the array.


The second option is to use multiple cross-indexed arrays for different fields.  This poses other discoverability issues, in particular that the argument names are insufficient by themselves to tell you the relationship between the arrays.  Neither of these are very good approaches, but there aren't better ones that we have found for passing large quantities of data into stored procedures for PostgreSQL versions prior to 8.4.

Starting with PostgreSQL 8.4, however this picture changes dramatically.  A tuple can contain members which are arrays of tuples.  Thus we get a level of expressiveness in PostgreSQL types which provides equivalent capabilities to structural and object oriented programming.  Looking towards LedgerSMB 1.4, we will make clear use of this feature.


Shortcoming 2:  Class management

The second major difficulty at present is class management.   The double-underscore is a poor separator between class and method name because it is not visually striking enough to readily notice when only a single underscore has been added.  Unfortunately there isn't a clear replacement, and schemas have the disadvantage of being a flat namespace.   Moreover schemas are already being considered for organizing the relational aspects of the software and so re-using schemas to solve this problem may cause more harm than good

Long-run I am looking at requiring that SODA functions accept the composite type that defines their first argument and allows for checking of return types which could then be handed off to constructors of other types.  This area my not come into play until we begin work on LedgerSMB 2.0 though.

Shortcoming 3:  Function Overloading and Interface Discovery

One obvious problem that crops up is that function overloading can cause problems here with overloaded functions.  If you have two functions with the same name and number of arguments, but the arguments differ by type, how does the application discover which interface to use?  Worse, if the argument names are the same, the 1.3 mapper will tend to grab the first (usually oldest) function and map the call to it.  Our response in 1.3 was to include test cases that test for function overloading and fail if non-whitelisted functions are overloaded.  This is far from optimal.

Although this problem was first brought up in relation to the fact that we use a weakly typed language (Perl), the language used has surprisingly little to do with the problem.  The fundamental issue is discovery, and adding type handling to Perl, or switching to a strongly typed language in fact does not address the problem at all.


With a discoverable interface, the first thing that must be done is define the discovery criteria.  With LedgerSMB 1.x that discovery criteria is the function name, which means that function overloading is out.  In future versions we may add the first argument type to this discovery, and in that case, allow for limited function overloading as long as the first type is different.


Misc Other Areas of Improvement


As we move towards 1.4 and beyond I expect more focus to be given to consistent interfaces for non-object-oriented environments.  Views, for example, may be added as consistent interfaces, displaying much of the data that would be accessible via a stored procedure.

Additionally,  we are looking at adding more type handling to Perl via Moose in order to solve the problem of internationalization of date and number formats, and other handling of type-specific operations.  This is beginning in 1.4 and will be on-going.



Conclusions


While SODA has delivered many benefits to LedgerSMB, it has not quite yet achieved all we'd like it to achieve.  The goal of fully discoverable interfaces in the database has not been reached yet.   A great deal of effort is yet to be done to reach this goal.

Tuesday, October 11, 2011

Next Steps with LedgerSMB 1.3 and 1.4

Now that LedgerSMB 1.3 is released, it's worth looking at the next steps.

The first is to correct new bugs reported by users in this initial phase.  All issues deferred in 1.3.0 should be fixed in 1.3.1 unless something really urgent comes up.

The second is to get a few add-ons developed during 1.3 into a more production-worthy state.  These include addons for budgetting, and an improved trial balance module which should perform better on big databases and also  provides a number of enhanced features, such as the ability to run trial balances for a subset of accounts.

Following this, we hope to see the continued evolution of the 1.3 series through a variety of sources including community support.  This will happen during the development of the 1.4 series.

Some work has already been done on 1.4 including removing dependencies on the PostgreSQL contrib modules (replacing connectby() with WITH RECURSIVE common table expressions), and the beginnings of re-engineering the financial schema to be more SODA-like.  Of course these mean that PostgreSQL 8.4 will be the minimal version usable there.

A number of important framework changes will be made also including depending on Moose for new Perl code, and using a number of advanced PostgreSQL features including arrays of complex types, again in keeping with moving to 8.4 as the minimally required version.

Introduction to SODA

LedgerSMB 1.3 provides what is a prototype for what I hope to develop in the future as a Service Oriented Database Architecture, or SODA.    This is an early version, venturing largely into uncharted territory doing things that I am unaware of them being done elsewhere.  This is one way we are pushing the envelope, and one way we will likely continue to do so.

As a note, LedgerSMB is closely tied to PostgreSQL, and as time goes on is likely to become essentially a PostgreSQL database and some sample applications written in Perl, with reference implementations in Python and PHP for further development.

What is SODA

SODA is an approach to database design which supports the idea that the database should be sufficiently semantically rich to provide code generators everything (in the model side of an MVC framework) they need to interoperate with the database.   Objects and classes are, to the extent feasible, defined in the database and exported to applications, rather than defined in applications and mapped to relations using ORMs.  In a perfect database implementing SODA, it would be possible to run a code generator in a language of your choice, and have it define classes and methods based on database types and stored procedures, which could then be used by your application quickly.

The key to SODA is to use what tools the database provides to make interfaces discoverable.  In PostgreSQL these involve such things as return types and argument names.  Examples will follow.

Basic Principles of SODA
  1. Interfaces must be discoverable, meaning stored procedures, data types, etc. can all be discovered at run time and automatically used.
  2. The application should be loosely tied to the database, and other applications should be able to to use the database easily, in any other language.
  3. All stored procedures should always return useful data.  A stored procedure which inserts, for example, might return the primary key of the row inserted or even the whole row (allowing, for example, for defaults to be filled in).  Returning VOID or NULL, or even TRUE, is generally frowned on and to be avoided.
  4. Security is enforced primarily on the database level, with applications presumed to know what operations their users are permitted to do.  If the application fails to enforce security, the database will, and the user may get an error.
  5. Exceptions in functions should be short and machine-readable.  "No password" is fine but "You forgot to enter a password.  Please try again" is not.  The application should be responsible for translating this into a useful message to the user.  This is in part because translation into other languages in the db is problematic using stock procedural languages, and also  because applications may wish to handle exceptions themselves.

Role of RDBMS in SODA-based Applications

SODA envisions the RDBMS not as the lower tier(s) in a multi-tier application infrastructure but as the centerpiece of both the application and its development environment.  SODA doesn't necessarily envision everything being in the database, far from it.  Nor does SODA necessarily envision that the database takes the place of middleware, but rather the database takes on many traditional middleware roles.

Instead, SODA sees the database in the center of the environment, and assumes that many clients will in fact be connecting to the database for a variety of uses, and that the RDBMS becomes a point of collaboration between applications (queue tables and LISTEN/NOTIFY come in handy here), and hence the point where security must be enforced.  Security can be enforced on the procedural or relational level.

Instead of "everything in the database" then, we have the idea that everything which is required to read and write data in a useful manner should be in the database, along with whatever is required to coordinate between applications.  The applications can then re-use business logic in a consistent way.  Logic which is not required for these two tasks does not belong in the database, and even some coordination frameworks may better reside elsewhere.  At the same time, SODA is very stored-procedure centric.

Examples
The following is a stored procedure which provides an ability to search for fixed asset classes (groups of fixed assets depreciated together with the same method):

CREATE OR REPLACE FUNCTION asset_class__search
(in_asset_account_id int, in_dep_account_id int,
in_method int, in_label text)
RETURNS SETOF asset_class_result AS
$$
DECLARE out_var asset_class_result;
BEGIN
        FOR out_var IN
                SELECT ac.id, ac.asset_account_id, aa.accno, aa.description,
                        ad.accno, ad.description, m.method, ac.method,
                        ac.label
                FROM asset_class ac
                JOIN account aa ON (aa.id = ac.asset_account_id)
                JOIN account ad ON (ad.id = ac.dep_account_id)
                JOIN asset_dep_method m ON (ac.method = m.id)
                WHERE
                        (in_asset_account_id is null
                                or in_asset_account_id = ac.asset_account_id)
                        AND (in_dep_account_id is null OR
                                in_dep_account_id = ac.dep_account_id)
                        AND (in_method is null OR in_method = ac.method)
                        AND (in_label IS NULL OR ac.label LIKE
                                '%' || in_label || '%')
               ORDER BY label
        LOOP
                RETURN NEXT out_var;
        END LOOP;
END;
$$ language plpgsql;

COMMENT ON FUNCTION asset_class__search
(in_asset_account_id int, in_dep_account_id int,
in_method int, in_label text) IS
$$ Returns a list of matching asset classes.  The account id's are exact matches
as is the method, but the label is a partial match.  NULL's match all.$$;

Now, in the above example the key is in the header to the function:

CREATE OR REPLACE FUNCTION asset_class__search
(in_asset_account_id int, in_dep_account_id int,
in_method int, in_label text)

From the name of the function, we know that this is for asset classes.  But thats not all we know.  We know exactly what information the stored procedure expects.  The variable names, prefixed with in_ to avoid collision with column names, tell the query generator what information to pull from the HTTP request object to build the search.  In this case, it will pull in the asset_account_id, dep_account_id, method, and label properties and map those into the stored procedure query.

Benefits of SODA

Thus far LedgerSMB has realized a number of benefits by applying these principles to our database design.  The first and most obvious is reduced development time.  With SODA, we define our database methods, then write Perl wrappers around the database procedures.  Then we can quickly write HTML templates and workflow scripts.  The heavy engineering is where it belongs:  In the mapper functions and the database.  The rest an be surprisingly agile.

A second important benefit is the fact that we have been able to retrofit real security into a codebase which did not admit to it when we began our fork.  By pushing security back into the database, we have been able to leave untrustworthy code untrusted.

Finally, we have found that it is very often far easier to tune hand-written SQL in stored procedures than it is to tune queries written through, for example, ORMs, allowing us to be able to address scalability problems when they arise.