Friday, April 7, 2023

Linux and Baikal Electronics: Why Postgres Should not Follow that Path

 In my recent trip to PgConf Russia, a friend brought up the reaction of a Linux kernel maintainer to a patch submission by Baikal Electronics.  The reaction has apparently shocked open source developers throughout Russia and I think perfectly encapsulates the dangers open source projects, including PostgreSQL, have to navigate today in the changing geopolitical context.  I decided to write a longer piece on this controversy and the issues surrounding it because I don't want to see PostgreSQL, LedgerSMB, or other open source projects I work with go down a similar path.

I believe that we are entering into a new era in open source development and I hope we move forward with our eyes open rather than sleepwalk into disaster.  Particularly as we move forward with dual-use technologies in Postgres, such as Transparent Data Encryption, we will face similar pressures and temptations.  I hope we don't go down the same road.

Background:  Baikal Electronics, Weapons, and Sanctions

The email occurs in the course of the Russo-Ukraine war and the purported efforts by Western powers to deprive Russia of dual-use computer chips, of a sort that could be useful for cruise missile guidance systems and other efforts.  I say "purported" because today computer chips are found in virtually everything and therefore this motivation doesn't make much sense.  Western countries have imposed a number of sanctions which attempt to restrict access to microchips and other sophisticated electronics.  This has included both export bans to Russia of microchips but also sanctions on Russian microchip manufacturers.

Modern weapons by most major powers are heavily computerized and rely on such components.  The stated hope by Western powers was to degrade Russia's military manufacturing and repair capabilities and thus end the war on terms favorable to Western interests.  It is also possible that the goal of sanctions was to destroy civilian life in the vain hope of a regime change (though there are no cases of Western sanctions achieving this in history, particularly when used against an adversary).

However, most aspects of modern life rely on these same microchips.  Banking systems, traffic lights, water treatment facilities, electrical generation, washing machines, refrigerators, and many more pieces of civilian equipment and infrastructure well outside the conflict zone depend on such technologies.  For this reason, companies like Baikal Electronics represent an important part of an effort at import substitution and sanction-resilience, not just for military uses but for civilian uses as well.  The Linux kernel maintainers have no problem accepting patches from US military and national security agencies (for example, SE-Linux was contributed by the NSA), but suddenly allowing civilian uses of Linux in Russia has become a problem.  It's hard to see this as a conscious decision rather than a product of unconscious bias and human error.  However, it could also be a conscious decision to weaponize open source software for geopolitical reasons.

Different Interpretations of Motivation:  Weapons vs Civilian Use, or Geopolitical Enforcement

The email here could be interpreted in a number of ways.  The most likely two are that this is a deliberate effort to extend sanctions to open source development or that it is a deep concern with what might be mistaken to be primarily a weapons contractor.  The first is genuinely scary, while the second could be expected to be resolved with a bit of dialogue.

It is possible that the decision was made due to US/EU sanctions or the desire to take a strong stance in favor of US interests (as portrayed by US media) and I understand this is the sense that people have taken this in Russia.  In this view, open source becomes just another weapon of geopolitics, and a way to punish countries we don't like.  After Russia, the same fate will surely befall China, and then any other country that dares to get in the way of American ambitions throughout the world.  Of course this would be rationalized as the argument that Russia is invading and occupying another country illegally and we ought not to support that, but the fact that the US has illegally invaded and is occupying half of Syria doesn't get the same treatment, so again we are dealing with a "rules-based" international order where the rules don't apply to the West.  As I will show below, any open source project that goes down this path will be superseded by more inclusive, international projects which may fork from them.

A more charitable view is based on the misunderstanding that Baikal Electronics' chips are primarily used for military applications and have limited civilian use.  This may have been true before the war, though even then the company was making inroads into civilian applications.  Today, the company has many civilian uses and, as import substitution becomes more important, these civilian uses are growing.  In this particular case, dialog ought to be able to resolve the issue and the patches ought to be able to be reviewed.  I think the fact that the Linux kernel maintainers have not decided to directly discriminate against Russians as a whole, or even residents of Russia, is evidence that this is a misunderstanding rather than a genuine effort at removing the political neutrality that defines open source.    However, if server vendors or even banks would be barred from making contributions for fixes affecting Baikal chips used in civilian applications, this would be quite troubling, and that seems to be the policy specified in the email.

I think the comments about refusing hardware support for Baikal's chips indicates that this is not merely about mistrust of Baikal as an organization technically, but rather something directed at them and their motivations for contribution.

After all, the Open Source Initiative's Open Source Definition does not allow licenses to discriminate against endeavors or groups of people, whether by citizenship, residency, or employment.  While this does not apply to accepting patches, it does apply to distribution so contributions by the NSA become available for use by the Russian armed forces.  The kernel maintainers don't have to accept patches from Russian nationals or corporations, but they have to let such people and corporations take advantage of improvements made by US allies.  As I will show below, this leads to problems for open source communities trying to cut themselves off from countries they don't like.

The Open Source as Infrastructure from the Unipolar Moment

In order to understand the current situation clearly enough to see what options other countries and corporations have, a bit of history into the current situation is helpful.

Open Source is generally understood to trace back to Cold-War-era academic development practices where academics would write software and share it for purposes of research.  This was confined to academia for a number of reasons but particularly due to difficulties with distribution.  At the time, the internet was not well developed, and so distribution and collaboration across significant distances, and with significant numbers of people, were difficult.

With the end of the Cold War, the situation dramatically changed.  The Unipolar Moment was accompanied by a move to hyper-globalism and this lead to increasing exports of computers, and the rise of a global internet.    This in turn lead to increasing investments in telecommunications infrastructure and the rise of a high-speed internet.  As computers became faster and storage became denser, and as internet speeds increased, many of the large open source projects we know today either began their lives or moved out from academia into the mainstream.

Open source development practices today are a product of this hyper-connected, globalized world.  Globalism, however, has not equally benefited all and has proved to be demonstrably unstable.  US efforts to remake the world in its image are now failing, and workers at home are increasingly unhappy with stagnant wages, lost jobs, and other consequences of a globalized labor market.  As a result we began to see  largely unsuccessful efforts to disconnect the US economy from the Chinese economy beginning in 2016, and more successful efforts to disconnect from the Russian economy starting in 2022.  The efforts at severing economic ties with China have furthermore been picked up by the Biden Administration, and there is a general understanding in both the US and China that similar sanctions are coming to China whenever practical and convenient.

The Global South, often weary of being threatened with US, UK, and EU sanctions, has steadfastly refused to support this effort.  Indeed, not a single country from the Global South has sanctioned Russia in the last year.  They are therefore at risk of being considered unfriendly countries and arbitrarily sanctioned via secondary sanctions.  What has developed is then what Rutgers University Political Science Professor Michael Rossi has called "The West vs The Rest." This leads to a dangerous dynamic where efforts to isolate one country risks alienating people in a large number of other countries.

Open source, however, remains as infrastructure.  Software products cannot consider themselves open source while enforcing geographic boundaries in distribution or preventing individuals, organizations, agencies, or nations one doesn't like from accessing, using, or modifying the source code.  This then means that this infrastructure remains no matter what we would like, and further that efforts at excluding those we don't like will generally backfire and eventually kill the projects that we want to protect.

Possible Responses from Non-Western Powers

 Of course the best option for everyone would be to work together to address misunderstandings and make sure that geopolitics does not get in the way of accepting code contributions.  However, if that is not possible, then other options might be necessary.  There are several possible reactions that can be taken from within non-Western powers to such exclusion.  These range from maintained patch sets to forking or even switching open source projects. 

The simplest approach for people who are excluded from a major open source project is just to maintain a patch set of existing work for use within their countries.  The advantage of this is that it is quick and easy.  You can maintain the patch set as a git branch, and perhaps even build or release from that branch.  This makes it quick and (usually) easy to keep up to date with git rebase, and if the patch set is small, is quite reasonable to do.

If the patch set becomes too large, then maintenance in this will be difficult, in which case forking and merging in changes from the mainline kernel may be an option.  This would allow, for example, a "Rinux" (contraction of Russian Linux) to develop.  The mainline kernel would still not take on further improvements from the Rinux fork, but Rinux could take all improvements from Linux.  If the developers are competent this would leave Rinux as a better, more capable project, and also very attractive to Chinese and other non-Western companies and governments as a project which would would be less likely to be weaponized.  If Rinux were particularly successful, it might even make inroads back into Western countries, leaving Linux increasingly used purely by Western governments and military organizations, and increasingly lagging behind until it wouldn't even be worth following anymore.

A final option would be to switch to a different kernel such as OpenBSD, which has a track record of a global outlook, particularly in areas of cryptography.  The particular track record of recruiting cryptographers not bound by US export restrictions may indicate they may be safer partners of collaboration than the Linux kernel at this point.

Of course the best option for everyone would be to work together to address misunderstandings and make sure that geopolitics does not get in the way of accepting code contributions. 

Our Choice: To Be Global or To Be Western

Gone now are the days when geopolitics could be safely ignored, but open source software projects today still have to decide how closely to tie themselves to Western geopolitical efforts at maintaining global power.  Emerging development communities, raised on open source, in much of the world will dramatically privilege those projects that don't tie themselves to the agendas of any country or group of countries but seek to create commons that all can participate in, well beyond borders.

LedgerSMB was born of a need to maintain software while I was excluded from the SQL-Ledger community.  What we found was that more inclusive communities always beat less inclusive ones, and that inclusive forks often kill non-inclusive parent projects.

I hope that the Linux community comes to its senses before it is too late.  And I hope the PostgreSQL community never experiences such insanity, because this can be fatal.  Of curse the projects that make this error will live on by other names, with other maintainers, and different committers, but the original projects are likely to die.  And I don't want this.

Wednesday, March 8, 2023

Which is worse when working on production databases? Being drunk or tired?

I have decided to do a series of mini-articles on human factors in database operations.   This is the first, and covers fatigue.

In my talk at the PostgreSQL devroom of Fosdem, I asked a few questions:

1.  How many of you have seen someone work on a production database while drunk?  About half the audience.

2.  How many times does this cause a major incident?  No hands.

3.  How many of you have seen someone cause a major incident by working on a production database while tired?  Half the audience again raised their hands.

As an industry, we do not take fatigue seriously enough.  We appreciate people who come in and work after long disruptive on-call shifts.  We don't tell people they are tired and therefore not safe to work on production systems.

We need to do better.  Every single major mistake in my career that has caused production problems has been caused either by power distance problems or by fatigue.

I am not saying people should come into work drunk.  There are probably a number of contextual aspects to why drunkenness doesn't cause a problem in these cases.  However I am saying that people should not touch production systems under fatigue.

Of course this is easier said than done,  If we are drunk, we can feel it, but with even light stress, we often don't feel our fatigue.  We aren't capable of self-monitoring our conditions in this regard.  Fatigue is thus insidious -- it gradually sneaks up on our, invisible, until we make critical mistakes and bad things happen.

While there are reasons to weigh the balance differently in some areas such as operating motor vehicles (to say nothing about flying an aircraft), the fact is that general brain-intensive can be impaired via moderate fatigue perhaps more than levels of alcohol we consider unacceptable while driving.

If we value production operations, we should adopt the following rule:  friends don't let friends work on the production databases tired.

Friday, March 3, 2023

The Coming Storm: Geopolitics and PostgreSQL in a Changing World

 At PGConf India, I watched Bruce Momjian's excellent talk Future Postgres Challenges.  This talk discusses technical, technological, and project-related challenges Postgres has faced and continues to face.  This immediately lead me to ask a question about efforts at geopolitical disentanglement and how we should try to avoid them in the Postgres community.  While this question immediately gets interpreted through the filter of the ongoing war in Ukraine, it is far broader, as the US has been trying to disentangle itself from China  and efforts there are ongoing.  The problems, however, are not new.

When we as a community had decided to have a code of conduct, there was a lot of concern that the code of conduct would be used in politically motivated ways, in particular regarding culture war topics.  I wrote a piece describing some of the issues which I also gave geopolitical importance to at that time.  Culture war topics, I reasoned, are necessarily cultural, and hence trying to push one group's cultural ideas on the world through an open source project would be very harmful.

I wrote that piece in early 2016, before the Liberal International Order began to visibly fall apart with votes for Brexit and the election of Donald Trump.  But the piece remains relevant because the dangers of allowing our community to be torn apart by current geopolitics are, if anything, even worse than they are in the culture war space.

If, as some would like, we were to shun community members and contributors who are resident in the "wrong" countries, or if we were to treat military contracts to American and Russian or Chinese governments differently, then we would risk the possibility of a fork forming around geopolitical lines.  Nothing good will come out of that for anyone.  And so the task ahead is to make sure that we continue making sure our community is culturally and geopolitically inclusive.

I expect that we will face a lot of pressure in coming years to bifurcate the community due to differing geopolitical perspectives, but such could well be fatal for Postgres as a project.  Let's not let that happen.

Edit:  I am moderating the comments as follows:  What is on topic is a discussion about what is good for the project and community.  Arguments about what is good for community that incidentally argue geopolitics are acceptable.  Efforts at arguing geopolitics untethered from that question will get removed.  I am trying to cultivate constructive discussion on a topic of community importance and sometimes this means pulling weeds (by which I mean deleting comments that detract from that). 

Also all conversations need to be in the spirit of discussing what is right, not who is right.  If you wonder why your comment got deleted, that's why.

Sunday, May 26, 2019

Table Inheritance: What's it Good For?

Table inheritance is one of the most misunderstood -- and powerful -- features of PostgreSQL.  With it, certain kinds of hard problems become easy.  While many folks who have been bitten by table inheritance tend to avoid the feature, this blog post is intended to provide a framework for reasoning about when table inheritance is actually the right tool for the job.

Table inheritance is, to be sure, a power tool and thus something to use only when it brings an overall reduction in complexity to the design.  Moreover the current documentation doesn't provide a lot of guidance regarding what the tool actually helps with and where are the performance costs and because inheritance sits orthogonal to relational design, working this out individually is very difficult.

This blog post covers uses of table inheritance which simplify overall database design and are not addressed by declarative partitioning, because they are used in areas other than table partitioning.

Table Inheritance Explained

PostgreSQL provides the ability for tables to exist in an inheritance directed acyclic graph.  Columns provided by parent tables are merged in name and type into the child table.  Altering a parent table and adding a column thus cascades this operation to all child tables, though if any child table has a column with the same name and different type, the operation will fail.

Inheritance, Tables, and Types

Every table in PostgreSQL has a corresponding campsite type, and any table can be implicitly cast to any parent table.  This is transitive.  Combined with tuple processing functions, this gives you a number of very powerful ways of working with data at various different levels of scale.

Indexes and foreign keys are not inherited.  Check constraints are inherited unless set to NO INHERIT.

Inheritance and Querying

When a table is queried, by default all child tables are also queried and their results appended to the result.  Because of exclusion constraint processing, this takes out an ACCESS SHARE lock on all child tables at planning time.  All rows are cast back to the type of the table target (in other words you get the columns of the table you queried).

Comparison to Java Interfaces

Despite the name, the closest equivalent to table inheritance in other programming languages are Java Interfaces.  Here too you get implicit casts, a subset of fields, and a promise of compatible interfaces. And as a Java class can implement multiple interfaces, multiple inheritance in PostgreSQL is supported.  Java programmers are encouraged to think of inheriting tables in interface rather than inheritance terms.

Use in Database Management Design

When we design a database there are often two overlapping concerns.  The first is in relational algebra operations on the data, and the second is in managing the data.  In a purely relational model this breaks down.

Notes Tables

One of the first really productive uses of table inheritance I had was in the notes tables in LedgerSMB.  There are several hundred table in the database, and we want to attach notes to some subset of these tables.  A naive approach might be a single global notes table with a bunch of foreign keys, or an ambiguous foreign key, or we just have a bunch of completely independent notes tables.  All of these have serious obvious problems however.  Large numbers of sparse foreign keys provide tons of NULL-handling problems, and provide a wide table that is harder to reason about.  Ambiguous foreign keys are a terrible anti pattern which should never be used due to data consistency problems, and large numbers of independent tables provide an opportunity for subtle errors due to knowledge management problems.

A slightly better solution might be to define a notes composite type, and use CREATE TABLE OF TYPE instead.  However typed tables of this sort have completely immutable schemas which makes them harder to manage over time.

We can then define a table structure something like as follows:

create table notes (
    id serial primary key,
    created_at timestamp not null default now(),
    created_by text not null,
    subject text not null,
    body text not null,
    fkey int not null,
    check (false) NO INHERIT
);

This table will never have any rows, but child tables can have rows.  For child tables, creating them is now easy:

create table invoice_notes (
    LIKE notes INCLUDING INDEXES,
    foreign key fkey REFERENCES invoice(id),
) INHERITS (notes);

The LIKE ... INCLUDING ALL indicates that we will copy in defaults, primary keys, and index definitions.  This now provides a forward-looking way of managing all notes tables going forward.  Uniqueness criteria remains enforced on a per-table basis.

If I later want to add a materialization of a column using a function I can do that in a reasonably straight-forward manner, at least compared to alternative approaches.

However, that's not all I can do.  I can then provide a search_terms function on the parent table which can be used to query child tables.

create or replace function search_terms(notes)
returns tsvector language sql immutable as
$$
select to_tsvector($1.subject) || to_tsvector($1.body);
$$;

I could then index, using GIN, the output of this function.  I still have to create the index on all current tables but if I index it now on the notes table, all tables I create with LIKE notes INCLUDING ALL will now have that index too. 

The function itself can be queried in a number of ways:

select * from invoice_notes n 
 where plainto_tsvector('something') @@ search_terms(n);

-- or

select * from invoice_notes n 
 where plainto_tsvector('something') @@ n.search_terms;

Once the function is created, that query works out of the box even though I never created a corresponding function for the invoice_notes table type.  Thus providing a consistent interface to a group of tables is an area where table inheritance can help clear out a lot of complexity very fast. Benefits include a more robust database design, more easily re-used human knowledge in how pieces fit together, and easier management of database schemas.

Note on Use in Set/Superset Modeling

There are a number of cases where the query implications of inheritance are more important.  This area is typically tricky because it often involves multiple inheritance and therefore there are a number of additional concerns that quickly crop up, though these have well-defined solutions discussed below.

Imagine we have an analytics database with numbers of pre-aggregated over possibly overlapping sets.  We want to sum up numbers quickly and easily without complicating the query language.  One option would be to create multiple views over a base table which includes the superset, but if your bulk operations primarily work over discrete subsets, you might get more out of breaking these out into subset tables which inherit the larger sets into which they are members.  This is, in effect, a reverse partitioning scheme where a single physical table shows up in multiple query tables.

In certain cases this can be easier to manage than a single large table with multiple views selecting portions of that view.  Use of this technique requires weighing different kinds of complexity and is best left for other posts.

Managing Schema Changes with Multiple Inheritance

In cases where multiple inheritance is used, adding and removing columns is relatively straight-forward, but altering existing tables can result in cases where an alteration interferes with checks on the other parent.  Renaming columns or changing types of columns is particularly tricky.  In most cases where this happens, a type change will not be done because rewriting tables is prohibitive, but renaming columns becomes the substitute and that is no less of a headache.

The key problem to note here is that the problem is that you have to make sure that both parents are changed at the same time, in the same statement.    So the solution here is to create a super parent table with the subset of columns to be acted on, and then drop it when done.   So here we:

begin;
create table to_modify (id int, new_id bigint);
alter table first_parent inherit to_modify;
alter table second_parent inherit to_modify;
alter table to_modify rename id to old_id;
alter table to_modify rename new_id to id;
commit;

The changes will then cascade down the inheritance graph properly.

Conclusions


Table inheritance is a surprisingly awesome feature in PostgreSQL, but misuse has given it a bad reputation.  There are many cases where it simplifies operation and long-term management of the database in cases where partitioning actually doesn't work that well.  This is a feature I expect to try to improve over time and hope others find it useful too, but to start we need to start using it for what it is good for, not the areas it falls short.

Friday, September 21, 2018

PostgreSQL at 20TB and Beyond Talk (PGConf Russia 2018)

It came out a while ago but I haven't promoted it much yet.


This is the recorded version of the PostgreSQL at 20TB and Beyond talk.  It covers a large, 500TB analytics pipeline and how we manage data.

For those wondering how well PostgreSQL actually scales, this talk is worth watching.

Thoughts on the Code of Conduct Controversy

My overall perspective here is that the PostgreSQL community needs a code of conduct, and one which allows the committee to act in some cases for off-infrastructure activity, but that the current code of conduct has some problems which could have been fixed if efforts had been better taken ensure that feedback was gathered when it was actionable.

This piece discusses what I feel was done poorly but also what was done well and why, despite a few significant missteps, I think PostgreSQL as a project is headed in the right direction in this area.

But a second important point here is to defend the importance of a code of conduct to dissenters here, explain why we need one, and why the scope needs to extend where it needs to extend to, and why we should not be overly worried about this going in a very bad direction.  The reason for this direction is that in part I found myself defending the need for a code of conduct to folks I collaborate with in Europe and the context had less to do with PostgreSQL than with the Linux kernel.  But the projects in this regard are far more different than they are similar.

Major Complaint:  Feedback Could Have Been Handled Better (Maybe Next Time)


In early May there was discussion about the formation of a code of conduct committee, in which I argued (successfully) that it was extremely important that the committee be geographically and culturally diverse so as to avoid one country's politics being unintentionally internationalized through a code of conduct.  This was accepted and as I will go into below this is the single most important protection we have against misuse of the code of conduct to push political agendas on the community.  However after this discussion there was no further solicitation for feedback until mid-September.

In Mid-September, the Code of Conduct plan was submitted to the list.  In the new code of conduct was a surprising amendment which had been made the previous month, expanding the code of conduct to all interactions between community members unless another code of conduct applied and superseded the PostgreSQL community code of conduct.  I objected to this as did several others with actionable criticism and alternatives.  Unfortunately we were joined by a large numbers of people wanting to relitigate whether we needed a code of conduct in the first place.   Those of us with actionable feedback were told that no changes would be made for about a year.  In essence what looked like a public comment period was not and the more actionable feedback was, the more clearly it was ignored.

Had there been an actual comment period on the proposed language, I maintain that things would have been more tame, but ignoring even actionable feedback in such a period, in my view, helped throw fuel on the fire regarding those who wanted to re-litigate the whole concept because it further helped push the view that a plan was announced and then any concern ignored.  This was unfortunate.  If there had been a comment period, a deliberation, and a final draft things would have gone better.

I hope that next time such a process is followed, where feedback on proposed final wording is taken before the decision is made to refuse to make changes for a year.

Why We Have Codes of Conduct


Humans are social animals.  Groups of humans form social groups, which often have group infrastructure which needs to be managed.  Open source thus has all of the political considerations of a multi-national collaborative community and this includes management of common infrastructure, and how we treat each other.  The kinds of social relationships and interactions that we have in the community are shaped by our culture, gender, and outlook on life, and in an international project there can be a lot of problems.  When national political issues are kept out of the project and the project consists mostly of people who are willing to defend themselves possibly aggressively, a project can get along ok without a code of conduct, but as things change, it is important that there be a means of resolving conflicts within the community.  Hence one needs a dedicated committee and a document which reminds people to act in ways that keep the peace.

Codes of conduct thus have a role in ensuring that people can come together and work in a collegial and civil manner across cultural, political and other disagreements, and continue to build the great software that we all rely on.  In this regard I think the PostgreSQL community has hit the most important milestones and begun to build a code of conduct infrastructure which can last and ensure that the code of conduct does not turn into a way of one group of people forcing a political agenda on the world.

I have been to many conferences.  Often at some point discussions turn to politics in some way.  With the exception of one conversation, these comments have been thoughtful, receptive, and mutually entertaining but in that one exception, I saw a certain degree of aggressiveness that might, for others, even rise to the level of physical intimidation.  A reminder that we all need to genuinely be nice to eachother is a step in the right direction.

Codes of conduct cannot create fairness.  They cannot create social justice.  They cannot broaden meritocracy to community contribution beyond code.  Those things have to be done through other means, but they can remind everyone to treat each other collegially and to respect differences of opinion and so forth.

However, codes of conduct cannot enable merely formalities to defeat this purpose.  A campaign of harassment that is taken off-list is at least as much of a problem as discussions on-list.  Therefore community-related conversations are things which might have to sometimes fall under the jurisdiction of community conflict adjudication mechanisms such as the Code of Conflict.

What PostgreSQL is Doing Right


The danger in any code of conduct is that an internal controversy from one country or culture will be read into disputes in a way which ensures that other cultural groups do not feel comfortable participating.  GLBT issues are an area where this commonly comes up, and in a project where you have a lot of involvement from countries where the views are very different from those of the US, this leads to big problems.  On one hand, some people may see others' cultural views as invalidating their sexual identity, while others would see views pushing universalism in GLBT rights as invalidating their cultural identity.  These issues cannot be resolved without retreating to a single cultural context as the norm, discouraging participation from much of the world, and thus need to be outside what a code of conduct handles.  In this case it does not matter what one believes to be the right approach, but rather the fact that the consequences of siding with either side in such a controversy would be devastating for the community.

One of the key points of the current Code of Conduct is that the committee is itself geographically and culturally diverse.  This ensures that the intra-committee cultural divisions will help ensure that the committee cannot just bull-doze a political orthodoxy out of fear of how a domestic controversy might be perceived.  The cultural diversity thus is an immense protection and it effectively ensures that there is a right to engage in the free struggle of political opinion in one's own country.

From a responsibility to civic engagement comes a right to such a free struggle of political opinion, and in my view this is something which is effectively preserved in the community today.  Note that this would not apply to trying to position the PostgreSQL project as against any political, cultural, or other group.  Nor should it protect actual personally directed harassment against any member for any reason.  I believe that the committee is capable of drawing these lines and hence I see the PostgreSQL project as off to a shaky but viable start.

Unlucky Timing


The Code of Conduct controversy accidentally coincided with the Linux Foundation adopting the Contributor Covenant as its code of conduct.  The Contributor Covenant is a code of conduct which transparently attempts to push certain norms of certain parts of the US political spectrum on the rest of the world (see, for example, Opalgate).   While I believe this to be a mistake, time will tell how this is handled.  The Contributor Covenant was soundly and decisively rejected by the PostgreSQL community early on as too transparently political.

A lot of the emotional reactions in this controversy by dissenters may well be in relation to that.  This is one of those things one cannot plan for and it makes it harder to have real discussions today.

Calls to Action and Conclusions


I have submitted a couple of requests for wording changes to the code of conduct committee.  For others who see a need for a committee to help ensure a collegial and productive community, and see opportunities for improvement I suggest you do the same.  But simply arguing about whether we need a resolution process is not productive and that probably needs to stop.

I also think the community needs to insist on two modifications to the current process:

  1. There needs to be a comment period and deliberation over feedback between a draft of a new revision and its adoption
  2. The code of conduct committee needs to reply with reasons why particular suggestions were rejected.
However on the other hand I think PostgreSQL as a project is off to a viable start in what is likely to become the right direction.  And that is something we should all be thankful for.

Wednesday, August 9, 2017

On Contempt Culture, a Reply to Aurynn Shaw

I saw an interesting presentation recorded and delivered on LinkedIn on contempt culture by Aurynn Shaw, delivered this year at PyCon.  I had worked with Aurynn on projects back when she used to work for Command Prompt.  You can watch the video below:



Unfortunately comments on a social media network are not sufficient for discussing nuance so I decided to put this blog post together.  In my view she is very right about a lot of things but there are some major areas where I disagree and therefore wanted to put together a full blog post explaining what I see as an alternative to what she rightly condemns.

To start out, I think she is very much right that there often exists a sort of tribalism in tech with people condemning each others tools, whether it be Perl vs PHP (her example) or vi vs emacs, and I think that can be harmful.  The comments here are aimed at fostering a sort of inclusive and nuanced conversation that is needed.

The Basic Problem

Every programming culture has norms, and many times groups from outside those norms tend to be condemned in some way or another. There are a number of reasons for this.  One is competition and the other is seeking approval in one's in group.   I think one could take her points further and argue that in part it is about an effort to improve the relative standing of one's group relative to others around it.

Probably the best example we can come up with in the PostgreSQL world is the way MySQL is looked at.  A typical attitude is that everyone should be using PostgreSQL and therefore people choosing MySQL are optimising for the wrong things.

But where I would start to break with Aurynn's analysis would be when we contrast how we look at MySQL with how we look at Oracle.  Oracle, too, has some major oversights (empty string being null if it is a varchar, no transactional DDL, etc).  Almost all of us may dislike the software and the company.  But people who work with Oracle still have prestige.  So bashing tools isn't quite the same thing as bashing the people who use them.  Part of it, no doubt, is that Oracle is more established, is an older player in the market, and therefore there is a natural degree of prestige that comes from working with the product.  But the question I have is what can we learn from that?

Some time ago, I wrote a the most popular blog post in the history of this blog.  It was a look at the differences in design between MySQL and PostgreSQL and was syndicated on DZone, featured in Hacker News, and otherwise got a fairly large review.   In general, aside from a couple of historical errors, the PostgreSQL-using audience loved the piece.  What surprised me though was that the MySQL-users also loved the piece.  In fact one comment that appeared (I think on Reddit) said that I had expressed why MySQL was better.

The positive outpouring from MySQL users, I think, came from the fact that I sympathetically looked at what MySQL was designed to do and what market it was designed for (applications that effectively own the database), describing how some things I considered misfeatures actually could be useful in that environment, but also being brutally honest about the tradeoffs.

Applying This to Programming Language Debates

Before I start discussing this topic, it is worth a quick tour of my experience as a software developer.

The first language I ever worked with was BASIC on a C64.  I then dabbled in Logo and some other languages, but the first language I taught myself professionally was PHP.  From there I taught myself some very basic Perl, Python, and C.  For a few years I worked with PHP and bash scripting, only to fall into doing Perl development by accident.  I also became mildly proficient in Javascript.

My PostgreSQL experience grew out of my Perl experience.  And about 3 years ago I was asked to start teaching Python courses.  I rose to this challenge.  Around the same time, I had a small project where we used Java and quickly found myself teaching Java and now I feel like I am moderately capable in that language.   I am now teaching myself Haskell (something I think I could not have done before really mastering Python). So I have worked with a lot of languages.  I can pick up new languages with ease.  Part of it is because I generally seek to understand a language as a product of its own history and the need it was intended to address.

As we all know different programming languages are associated with stereotypes.  Moreover, I would argue that stereotypes are usually imperfect understandings that out-group people have of in-group dynamics, so dismissing stereotypes is often as bad as simply accepting them.

PHP as a case study, compared to C.

I would like to start with an example of PHP, since this is the one specifically addressed in the talk and it is a language I have some significant experience writing software in.

PHP often is seen to be insecure because it is easy to write insecure software in the language.  Of course it is easy to write insecure software in any language, but certain vulnerabilities are a particular problem in PHP due to lexical structure and (sometimes) standard library issues.

Lexically, the big issue with PHP is the fact that the language is designed to be a preprocessor to SGML files (and in fact it used to be called the PHP Hypertext Preprocessor).  For this reason everything, PHP is easy to embed in SGML PI tags (so you can write a PHP template as a piece of valid HTML).  This is a great feature but it makes cross site scripting particularly easy to overlook.  A lot of the standard library in the 1990's had really odd behaviour, though much of this has been corrected.

Aurynn is right to point to the fact that these were exacerbated by a flood of new programmers during the rise of PHP, but one thing she does not discuss in the talk is how software and internet security were also changing during the time.  In essence, the late 1990's saw the rise of SSH (20k users in 1995 to over 2M in 2000), the end of transmission of passwords across the internet in plain text, the rise of concern about SQL injection and XSS, and so forth.  PHP's basic features were in place just before this really got going, and adding to this a new developer community, and you have a recipe for security problems.  Of course, today, PHP has outgrown a lot of this and PHP developers today have codified best practices to deal with a lot of the current threats.

If we contrast this with C as programming language, C has even more glaring lexical issues regarding security, from double free bug possibilities to buffer overruns.  C, however, is a very unforgiving language and consequently, it doesn't tend to be a language that has a large, novice developer community. At the same time, a whole lot of security issues come out of software in C.

Conclusions

There is no such thing as a perfect tool (database, programming language, etc).  As we grow as professionals, part of that process is learning to better use the strengths of the technologies we work with and part of it is learning to overcome the oversights and problems of the tools as well.

It is further not the case that just because a programmer primarily uses a tool with real oversights that this reflects poor judgment from the programmer.  Rather this process of learning can have the opposite impact.  C programmers tend to be very knowledgeable because they have to be.  The same is true for Javascript programmers for very different reasons.  And one doesn't have to validate all language design decisions in order to respect others.

Instead of attacking developers of other languages, my recommendation is, when you see a problem, to neutrally and respectfully point it out, not from a position of superiority but a position of respectful assistance and also to understand that often what may seem like poor decisions in the design of a language may in fact have real benefits in some cases.

For example, Java as a language encourages mediocrity of code. It is very easy to become a mediocre Java developer.  But once you understand Java as a language, this becomes a feature because it means that the barrier to understanding and debugging (and hence maintaining!) code is reduced, and once you understand that you can put emphasis instead on design and tooling.    This, of course, also has costs since it is easy for legacy patterns to emerge in the tooling (JavaBeans for example) but it allows some really amazing frameworks, such as Spring.

On the other extreme, Javascript is a language characterised by shortcuts taken during the initial design stage (for time constraint reasons) and some of those cause real problems, but others make hard things possible.  Javascript makes it, also, very easy to be a bad Javascript programmer.  But perhaps for this reason I have found that professional Javascript programmers tend to be extremely knowledgeable, and have had to work very hard to master software development in the language, and they usually bring to the table great insights into computing problems generally.

So what I would recommend that people take away is the idea that in fact we do grow out of hardship, and that problems in tools are overcome over time.  So for that reason discussing real shortcomings of tools while at the same time respecting communities and their ability to grow and overcome problems is important.