The Multi-Protocol Fantasy

There’s a particular class of problem that I’ve grappled with regularly over the past decade. I got to go another round just before the holiday break. I figured that this was as good a time as any to share some thoughts.

Multiple representations

Some file servers are able to present a single filesystem via two (or more) different protocols. The most common use case is to support both Linux and Windows clients from the same underlying data. In the usual setup, Linux systems see a POSIX compliant filesystem via their old reliable Network File System (NFS) protocol. Windows clients use the Windows native CIFS/Samba and see a friendly, ordinary NTFS filesystem (sometimes mapped as “The L: drive” or similar). In recent years, vendors and developers have begun to add support for newer systems like S3 and HDFS.

We use this sort of thing all the time in the life sciences. Lab instruments (and laboratory people) write data to a Windows share. The back end heavy lifting of the high performance computing (HPC) environment is done on Linux servers.

The reason that this seems like a good idea is that it’s kind of a pain to convince Linux systems to mount via CIFS / Samba, and it’s also kind of a pain to do the reverse and mount NFS from Windows. While it is certainly possible, all of us of a certain age will respond with a grouchy “harumph,” whenever the idea comes up. I’ve spent many happy hours navigating the complexity of the “unix services” tab on the active directory master (assuming that a lowly Linux admin is allowed access to the enterprise identity management service) or else googling around (again) to find that one last command to convince Linux to “bind” itself to an LDAP server.

The underlying problem

POSIX and NTFS are just different. As soon as we stray beyond the very simplest use cases (writing from a lab-user account and reading from a pipeline analysis account, for example), we encounter situations where it is literally impossible to give correct semantics on both sides. One filesystem has to present an invalid answer in order to provide correct behavior in the other.

Under POSIX (the Linux / NFS flavor) we use read, write, and execute permissions for a single user (the owner), a single group, and a conceptual “everybody.”

Under Windows, we have “access control lists” (ACLs). Any entity from active directory (either users or groups) can have read or write permissions. The idea of a file being executable is handled in a different way.

It’s straightforward to create permissions under windows that cannot be directly applied on the Linux side. The simplest example is a file for which exactly two users have read access – but where there is no group comprised of only those two users.

Note that this is true even if all of the users and groups are already correctly mapped between Windows to Linux.

Note further that as you start creating utility groups to overcome this “bug,” you will rapidly discover that NFS only honors the first 16 groups listed for a user – even if those groups are served up by an NIS or an LDAP service.

There are plenty of other examples. The rat’s nest I found myself in last month involved setting default permissions on files created by Windows such that they would be sensible on the Linux side. Under Linux, each user has a single default group (specified by number in a file called /etc/passwd). That concept, of a default group for each user, simply doesn’t exist under Windows.

Go around

Back in November, I wrote a post titled Go Around, about how, sometimes, when you hit an intractable obstacle it just means you’re looking at a problem from the wrong angle. That’s exactly what’s going on when you spend time yelling at a vendor or deforming your active directory forest to solve these protocol collisions. It’s asking the impossible.

As an engineer or informatics person, it is up to you to find a way to frame problems such that they are actually solvable.

There is no such thing as a free lunch on this one. If you want to use POSIX semantics, you have to accept the limitations of POSIX. The same is true of Windows, S3, HDFS, or any other data representation.

Vendors: Multi-protocol permissions blending will never ever work. It just looks like buggy, flaky behavior on the client side. Stop trying.

Users: Go around.

Addendum

The way that you “go around” on this one is to decide which protocol the majority of usage will come from and engineer for that.

If it’s going to be used mostly from Linux – then don’t set complex permissions under Windows, and tell the users that their fancy ACLs are not going to be honored. If it’s the other way around (mostly Windows use), then create a utility group on the Linux side (just the one) and make sure that all the appropriate users are in that group.

Trust me, it’s much easier than trying to merge fundamentally incompatible semantics.