[ACL-Devel] FS meta information (properties/attributes/...)

Robert Watson robert@cyrus.watson.org
Sat, 15 Apr 2000 16:05:54 -0400 (EDT)


So I've lurked on the acl-devel mailing list for a while since Andreas
pointed me at it during a POSIX.1e discussion.  For those I haven't
interacted with before, I've been working on capability and ACL support on
FreeBSD/TrustedBSD.

I thought I'd step in because I did the extended attribute implementation
for FreeBSD -- in fact, I'm in the process of committing it to the FreeBSD
source tree, having had it in testing for the last 4 or 5 months.  As
such, I'll briefly introduce how it works, and then respond to your
question/comments.  I'll also comment on the performance implications of
the design choices, and talk about some alternative.

We chose not to modify the base file system, and instead follow the model
of the BSD quota support--back the attributes to a file (possibly in the
same file system, possibly elsewhere).  The backing file consists of a
short header declaring attribute settings (max length, permissions for
modification (kernel, root, owner, anyone)), etc.  The remainder of the
file is treated as an array of attribute records, indexed by inode number
for that attribute.  Because sparse files are available, this is actually
relatively efficient.  Each attribute record consists of a short header
containing flags (is the attribute defined for this inode, length),
followed by the value of the attribute for that inode.  We see the
filenames of these attribute files being named things like
``/.attribute/md5'', ``/.attribute/acl_access'',
``/.attribute/acl_default'', etc.  See below for some criticisms of this
technique.

We see a number of desirable semantics/features: 

o Extended attributes consist of zero or more (name,value) pairs assigned
  to inodes (directories, files, device nodes, whatever)

o Namespace operations within a single file system should not affect the
  attributes associated with an inode.  I.e., a rename, link, or unlink
  (as long as the inode is still ``allocated'') should not influence the
  attributes.  Specifically, renaming a file should not cause its
  attributes to be detached. :-)

o Each attribute name has the same syntactic requirements as a pathname
  component (i.e., same limits as a filename)

o For a given inode, for each possible attribute name, may or may not be
  "defined". 

o If an attribute is defined for an inode, then the associated data may
  be of zero or more bytes in length, and considered an opaque data type
  with no syntactic requirements (i.e., a binary blob)

o Because we're attempting to achieve a (name,value) semantic, any write
  to an attribute replaces the current value.  As such, there are no
  append, length modification, etc, operations, only ``read'', ``write'',
  and ``delete'', which returns the attribute name to the undefined
  state.

There are a number of properties that are properties of our implementation
backing the syscall and VFS APIs:

o Attribute names are declared on a per-FS basis, and only declared
  attribute names may be used on a file system.  We require that the
  root account be used to declare new attributes.

o Each attribute name has a maximum data size associated with it, on
  a per-FS basis.  For example, ``ACL'' on ``/'' might have a maximum
  possible size of 64 bytes.  Or ``MD5'' on ``/'' might have a maximum
  of 16 bytes.

However, our APIs for manipulating attributes per-file do not require
these semantics, and, for example, HPFS support on FreeBSD will not
require these semantics to be the case.  HPFS places its own extended
attribute limitations which are also not a property of the API.

On Wed, 12 Apr 2000, Andreas Gruenbacher wrote:

> After looking at some documentation, i'm more convinced that we should
> really go for a general mechanism to store extended attributes for
> inodes. From what I saw so far, name/value pairs seem to be the right
> thing. 

Same here.  Modifying the base file system is entirely feasible (as I
believe you've done, and I've also done), but we found that it introduced
lots of backwards compatibility issues, made it hard to experiment, and
didn't scale to new features as they were developed (such as capabilities,
MAC labels, etc).  Supporting extended attributes, either without base FS
modification, or with modification, is still vastly more flexible than
adding, say, capabilities flags or ACL support to the base FS.

> It seems no problem to use simple ASCII strings for the names. In IRIX
> 6.5 attr(1) the name is limited to 256 characters. Again, this seems to
> be a sane assumption. 

For my initial implementation, I limited attribute names to 32 characters.
That said, a practical and more useful limit would be whatever the default
max path length component is on the OS in questions.  I'll probably switch
to that in a few days as past of a post-commit cleanup.

> Now there again is a choice. Do we need arbitrary length attributes? 
> 
> If so, the interface would need to be file-like, with several levels of
> indirection etc. (the full overheads). Otherwise, the data of one
> attribute can be passed to/from the kernel in a fixed-size buffer. I'd
> prefer the latter, again for efficiency reasons. Any other opinions? 
> 
> Also, a fixed upper limit (say, 256K) would allow a simple allocation
> scheme on the filesystem (one level of indirection should do). Does
> anybody know if/how the existing code can be reused? 
> 
> I guess Linda said Irix would support arbitrary length attributes. At
> least in the man page referenced above, it says the limit is 256K. Was
> that limit lifted recently? 

For simplicity of implementation, I went with a predefined maximum bound
per-attribute, and then allow an attribute to be undefined, or to be
defined and 0 through the maximum size possible.  As I use attributes in a
name=value semantic with atomic replacement of value, this semantic allows
you to consider a file as having zero or more named attributes; attempting
to read an attribute that is undefined results in ENOENT.

> Is anybody working on an attribute storage system already? 

I'd love to see cross-platform portability on this interface--POSIX.1e
doesn't cover it.  You can see a slightly dated version of the code (I
will update it shortly) on http://www.trustedbsd.org/downloads/.  This is,
of course, for a BSD-style UFS/FFS and VFS, but might be useful.  Also,
the FreeBSD extattr(9), VOP_GETEXTATTR(9), VOP_SETEXTATTR(9) man pages
cover the interface semantics in the kernel.

Our implementation has a number of disadvantages--some have to do with our
specific implementation, others have to do with the type of backing store
we are currently using, etc.

1) We rely on sparse files to store attribute data with relative
efficiency.  I.e., reading beyond the end of the file (undefined) returns
zero's, which result in the ``defined'' flag being set appropriately.
This has a number of downsides, including what happens when people apply
cp(1) to the attribute files, or use backup utils that don't understand
sparse files.  Also, space cannot be reclaimed if the attribute is deleted
(not too bad for capability/mac/acls).

2) An attribute must be declared and configured fs-wide by the
administrator, which limits flexibility to introducing new attributes for
users on demand.  The maximum size cannot easily be modified at runtime.

3) Space for attributes relating to quotas is billed to the attribute file
owner, not the owner of the object the attribute is attached to

4) We can't save space by allowing identical attribute values to consume
the same space, which is an optimization quite possible and effective with
ACLs.  Clearly the backing file design could be modified to handle this
through an extra level of indirection (attribute.index, attribute.value)
but there are still limitations.

5) The administrator must be very careful that the attribute service is up
and running before any attribute manipulation may take place, as attribute
data cannot be updated when it's not enabled.  For example, if an inode is
freed, and then reallocated, while attribute service is not running, then
the old attributes will be tied to the new file.  This is addressable
through the inode generation number, and we should probably do that.

6) The current limitation relies on a per-attribute-backing-file lock,
which throttles access to attributes.  This is relatively easily fixable,
as it's a property of our implementation.

There are clearly limitations to this approach, but on the whole we've
found it a lot easier for people to stomach than tearing up the base file
system one or more times as various features are added.  Ways to
incorporate extended attributes into the base file structure are under
consideration, but it's not considered as pressing a need as we can make
progress implementing various TrustedBSD features without it.

Hope this is helpful--the code is under a two-clause BSD-style license,
and as such people are free to take what they will (preserving the
copyright, of course :-).  Since it's designed for the FreeBSD UFS/FFS and
VFS implementations, there's probably not so much that's directly
applicable, but might be food for thought.

  Robert N M Watson 

robert@fledge.watson.org              http://www.watson.org/~robert/
PGP key fingerprint: AF B5 5F FF A6 4A 79 37  ED 5F 55 E9 58 04 6A B1
TIS Labs at Network Associates, Safeport Network Services


-------------------------------------------------------------------------
Linux ACL Developers List ---  http://acl.bestbits.at/acl-devel/

To unsubscribe, send a message with `unsubscribe acl-devel'
in the message body to majordomo@bestbits.at.
-------------------------------------------------------------------------