WWDC2001 Session 116
Transcript
Kind: captions
Language: en
thank you and welcome to session 116 Mac
OS file systems the Mac OS is supported
multiple file systems for a long time
now and now with the UNIX foundation in
OS 10
we support whole plethora of file
systems and to tell you all about those
different file systems and and how you
should be using them I like to introduce
the manager of the core core OS file
systems group clock Warner I'm really
glad to see all of you here I know it's
early and I appreciate your coming
I hope we're gonna have some very useful
information for you about the Mac OS 10
file system we'll start with this our
welcome and a little idea of some of the
stuff that we're going to cover today
we're not going to talk as much about
futures obviously you've probably heard
that as a theme throughout the
Developers Conference the present is so
much more exciting this year and we're
not going to talk as much about file
system internals because we know that a
lot of you are now in the process of
bringing your apps over to Mac OS 10 so
we're going to concentrate on some of
the things that your apps need to be
prepared for when they're using the file
system as Jason mentioned we have a
number of different file systems in Mac
OS 10 and they do present some
occasional wrinkles we also are going to
give you some tips into how to improve
the performance of your app especially
with regard when it's to its use of the
file system but we will talk a little
bit about some of the key issues in
building file systems yourself to add to
Mac OS 10 if you went to the Darwin
overview session or probably any of the
core OS sessions you've seen this chart
already this is the key you are here
graphic if we run an airplane now you
know we'd be saying this is the San Jose
if you're going to Hawaii get off now we
are part of the bsd kernel inside of the
file system here's a blow-up of what the
file system looks like internally we
support basically at the core OS level
the bsd berkeley standard software
distribution system calls with some
extensions inside of the file system
there is a big switch we call the
virtual file system layer which
separates the part of the file system
which is dependent on the underlying
volume format or network protocol from
the part of the file system which is
independent so the stuff above the vs
VFS layer is independent the stuff below
is dependent and that's why you see our
list of file systems below underneath
the virtual file
system switch u FS h FS NFS and we'll
talk to you about all the various file
systems that are supported in Mac OS 10
here's the outline of today's talk
there'll be something of a status update
basically an indication of the file
systems that we ship we're gonna do a
demo of a couple of the new ones we're
going to talk only briefly about some of
the different file system interfaces
we've covered that a number of years and
there are a lot of sessions describing
the application frameworks for use in
Mac OS 10 but we will spend a fair
amount of time talking about the
differences between the various file
systems that Mac OS 10 supports and
again how that might affect your
application development we'll talk about
security because that's been a large
issue for a lot of folks Mac OS 10 is a
multi-user system with perleval file
permissions and in that way it differs
greatly from Mac OS 9 and you need to be
ready for permissions our errors
potentially in your applications we're
going to talk about performance
considerations as I mentioned and we'll
talk a little bit about building new
file systems first status slide I think
this is my third or fourth Worldwide
Developers Conference session talking
about the interesting things we were
going to do in Mac OS 10 and the recent
things we had done in interim build a B
and so forth this is the first time I
get to say this I am NOT above cheap
applause
this is why I love this crowd alright so
Mac os10 we have three primary file
systems and when we say primary what we
mean is they're fully read and write
file systems and we can boot and route
off of them so we can boot and route off
of the Mac OS extended format file
system best known to those who know and
love it as HFS+ we can boot in rudolph
of ufs that the UNIX file system that we
support based on the Berkeley fast file
system and we can boot and Rudolph of
NFS in fact we do that internally all
the time for installs we also have some
readwrite file systems that we don't
boot and Rudolph of but are otherwise
first-class citizens and that includes
the Mac OS standard format otherwise
known as HFS for legacy data that you
may have on your Mac the Apple file
protocol a client was delivered by the
server team into Mac OS 10 we have
support for an estas file system and
specifically we mean fat16 and fat32
here which we did largely for digital
cameras and so forth but also zip drives
and removable media coming from your
Doc's machines and WebDAV and we'll talk
about web dev in more detail a little
bit later on in the talk there are also
some read-only file system supported in
Mac OS 10 including ISO 9660 which is
used on lots of CD formats especially to
interchange between Mac and Windows
we support the universal disk format
actually I don't think it's call bet
anymore I think they just use the
initials UDF sort of like KFC and
Kentucky Fried Chicken and CD da FS a
file system written by the CPU software
group to support CD audio files CD audio
drives CD audio discs rather about music
ISO 9660 I should mention we can also
boot and root from I didn't mention in
the primary file systems because it's
not read and write so I'm going to do a
demo I'd like to bring up demo machine
number one and I'd also like to bring up
my lovely and talented assistant Scott
Roberts
you might have seen a demo similar to
this in the keynote avi got his digital
camera up and took a picture of the
audience I'm gonna have Scott do the
same thing here thank you Scott now it
turns out I'm not a vice president and
as a result I don't I don't make as much
money as avi - Vania and so I can't
afford a really nice fancy digital
camera that has USB hot plug and so
forth and so on I have this camera that
was given to me as a gift by a friend of
mine actually a kodak DC 210 plus and
but the nice thing ello doesn't have USB
connectivity and all that it does have
this little compact flash card that it
uses for memory storage and we happen to
have a SAN disc reader here attached to
our demo machine and so I'm going to put
this little card in here
let's pray a little bit and you'll see
the image come up on the system now
we're doing something differently than
avi did because we're not here to show
you image capture or any of the fancy
camera applications I just want you to
know that this little compact flash card
is formatted as an ms-dos disk and so
we're looking at an ms-dos volume that's
just loaded on our desktop and if I
bring up the preview application and I
go here to the image that Scott just
took and open it up there's a picture of
you folks
I must have said something wrong I don't
know what okay for our other demo I
wanted to show you the web dev file
system I've got over here on my other
demo machine I pre-configured it as a
web server and I actually changed the
configuration file so that it would run
the davit actually ships in Mac OS 10 as
part of our Apache installation so host
2.1 62 is running web dev and i'm going
to bring up internet explorer here which
I should have pre-launch sorry to waste
your folks this time and I'm gonna type
in oops the URL and you can see we have
our little Apache page now what I'm
going to do is since I'm bored with
looking at things in web browsers I'm
going to mount this particular website
as if it were a volume on my desktop I
do that by hitting command K going to
the connect to server dialog and typing
in the URL here instead and now you
notice I now have the Apache web site
actually mounted as a volume of my
desktop well why is that interesting let
me show you one thing that we can do we
took a movie actually last year's file
system session and we edited it and made
a small movie out of it and we put it on
the web server and now instead of going
to the browser and trying to deal with
the QuickTime plugins and so forth I'm
just gonna double click that movie and
we'll show it for you now oops let me
back it up a little bit here
thank you
the stage police are watching me and
I've heard from my spies that when I
turn around to jump on the stage
something Bad's gonna happen to me I
have to be careful I've noticed that the
stage is higher this year than last so I
think I'm ready I'm getting older I
can't do this kind of stuff every year
okay let me show you one more thing that
I think is pretty interesting I'm gonna
go into BBEdit here now I haven't
actually completely rehearsed this down
I was fooling around with it before the
show but I thought it would be kind of
an interesting thing to do so I'm gonna
bring up BBEdit and I'm gonna open up a
file on the WebDAV server and it's going
to be the english HTML index file and I
see some text here that says let's see
if you can see this it means the
installation of the Apache server went
okay I don't like that text the text I
like is file systems rule I like that
much better now I'm going to go to the
web server here and refresh and you can
see file systems rule up here in the top
so an interesting way to edit your
website is to enable dev on it and then
you can just use your favorite editing
tools and do whatever you like inside
your favorite editing tool when you save
it's automatically populated right back
up on the server
you know I had to have a plant for that
for that applause okay let me talk just
briefly about some of the file system
interfaces in Mac OS 10 this is the
little file system interface chart you
can see the three major application
environments classic carbon and cocoa on
the top they all have their own file
manager or file object interfaces and
but they all come through the BSD layer
and we've extended the BSD layer to
allow access to filesystem metadata
that's not typically available in UNIX
especially catalog information finder
and create a file type and creator type
and things that you find on HFS+ and
underneath all of that is our virtual
file systems which I mentioned before
and that's where additional file systems
that you might develop in Mac OS 10 or
stacks would layer into our system so I
won't go again it's detail here but
suffice it to say the carbon file
manager is still available to you it has
the mac OS interface file system
interface is carried forward there's
access by volume reference number
directory and ID still available and
carbon does an interesting thing Carbon
provides HFS style semantics on file
systems that don't normally support it
so if you're using ufs but your app is
using carbon you'll still see resource
Forks and you'll still see file IDs the
file IDs don't last across mounts
they're they're done in memory and kept
for you only while the mount exists but
nonetheless they're present if you do a
call and you want to see them so you're
somewhat insulated if you're using
carbon from some of the differences
across file systems we also of course
have the cocoa environment with an
object-oriented API and some file
objects that you can use and as I
mentioned the Berkeley UNIX interfaces
are available and can be used by your
app as well even in a mixed environment
with the other application frameworks we
want to just briefly mention a couple of
the new calls that we added to the BSD
layer for accessing HFS file metadata
and sorts of things that aren't
typically available under UNIX two major
ones get a tour list and set out our
list which roughly stand for get
attribute list and set attribute list
these are flexible calls designed to
retrieve various types of metadata
various different formats back to your
application normally you wouldn't call
these you'd probably call get cat info
said CAD info and carbon but they're
available to you if you need to they are
in our system we also implemented a call
called search FS which whose job is to
do fast catalog searching it's supported
in HFS+ although most file systems don't
support it but it does allow for fast
catalog searching and it was designed to
support the PD cat search functionality
and carbon we also have a call for
exchanging data between files
those of you who've done carbon apps
before are probably familiar with the
exchange files call we created a BST
level call called exchange data to do
that same atomic transfer of data
between two files and finally we've
added some options to F control which is
a standard UNIX file system call but we
have some extensions to allow behavior
like allocating storage in advance of
the elio F those data that's not part of
your file per se but is part of the file
allocated storage which can be extended
later without additional allocation now
we're at the file system differences
part of the talk and I'm gonna grab a
little cup of water here because this is
gonna be a little longer and this stuff
is important there are a station
mentioned many file system supported Mac
OS 10 and while Mac OS 10 tries to be as
file system agnostic as possible the
underlying volume formats and the
underlying network protocols do impact
the behavior of file systems and in some
ways it's just unavoidable and your app
may need to be ready for it so we wanted
to give you an idea of some of the
differences that you'll need to watch
out for kind of a set kind of a you know
warning about some stuff the first set
our naming differences our file systems
often behave differently with regard to
name and two of the biggest issues are
the name lengths that the file system
will support and their whether or not
they are case sensitive or case
insensitive as a lot of you know HFS+ is
a case insensitive file system the Mac
has historically worked that way and
people are accustomed to it but the UNIX
file system is historically case
sensitive and people who are used to
using a UNIX environment are accustomed
to that and we wanted people to have an
option in what kind of file system that
they supported because some of these
differences are actually valued by
people especially ufs for example
supports holes so you can have very
large files with not so much data
storage and it doesn't take up much room
on your volume HFS+ doesn't
that maybe Han hfs+ supports full
unicode characters anyway well we'll
talk about some of those differences as
we go access by persistent ID or path
all of our file systems support path
based access some of our file systems
also support access by identifier and
HFS+ of course is among the file system
that supports access by identifier we
also have a call that allows for an
error to be returned if you try to
delete a file while it's held open
deleting an open file has historically
been a no-no on the Mac and it's
historically returned an error
however unlinking which is the delete
call on UNIX a file typically does not
return an error even if the file is open
and we support that semantics but we
wanted Carbon to be able to get an error
back if someone tried to delete a file
which was open because apps are
expecting that behavior so we built a
special call we nicknamed it delete I
think it's system called 227 and it just
checks for a reference and returns an
error likewise some of our file systems
full Unicode support some local bytes
some utf-8 and some of our file system
support permissions and some don't
and what I mean by that is the volume
format or the network protocol will have
provisions for permissions data to be
stored and or transmitted some file
systems like HFS for example this dental
Mac OS standard format have no storage
for permissions information and so we
will use default permissions for the
entire volume and often substituting the
logged in user as the owner for all of
the files on that volume with a default
permissions man some of our file systems
support hard links which is the ability
to create separate nodes in the file
system pointing to the same data where
the nodes are essentially equal not an
alias from one to the actual object some
of our file systems have storage for
catalog data catalog information some of
them don't some of our file systems are
multi-fork some of them are I wanted to
talk to you about web dev in particular
because web dev is a really good way of
highlighting some of the odd differences
you might see between file systems it's
interesting that we have this sort of
file so the Magnox tech architecture
because it allows us to do things like
mount to web server as a file system but
there are going to be some gotchas the
WebDAV protocol does not have any sort
of dates besides modification dates so
if you wish to get the access date of a
file on the server
it's actually impossible we have to sort
of fake that information so that's a one
classic difference
inode numbers aren't a concept that is
supported in the WebDAV protocol either
webdev I should mention stands for
web-based distributed authoring and
versioning the reason it came into
existence was to allow collaborative
authoring on the web I think it's in
visual it's a sorry original envisioning
was for people who are doing development
in a web-based authoring tool to be able
to move things to and from the server
and affect the files on the server but
it wasn't originally a file systems
protocol that's something Apple decided
we could do because the protocol added
enough support in the way of a
consistent hierarchical namespace some
synchronization with locking and
property management
but those properties do not include
inode numbers and so very much like
carbon and file IDs when we see a file
in WebDAV we generate an inode number
and we remember that inode number for
the life of your Mount but if you
unmount the web dev volume and mount it
again the inode numbers for files will
not be the same also we can't set live
properties in WebDAV through the
protocol live properties are the ones
that are actually supported by the
server there's also a notion of dead
properties in web dev which are you can
make up a property and store value in it
but real live properties like for
example the access time those those
properties are actually on the side of
the modification time those properties
are actually on the server and we want
to respect those and return those to you
but the server's will not let us change
them so you can't do a set mod time on
web dev and have it work we'll have that
silently failed to keep apps from
killing over but be advised that you can
see a silent failure of a set mod time
the security model for web dev is
entirely different from what you'd
expect from a file system it's HTTP
security it's basic authentication for
us generally which means that you try a
request and if the server doesn't like
you it gives you back a message that
says authorization denied and it's your
job to find out the users username and
password and try again sending it across
but there is no way except for testing
an application to determine if the user
is going to be able to do that operation
or not if you send a put across to the
server which is the mechanism for taking
a file and moving it up you don't know
if it's gonna succeed or fail there
isn't a pre-flight call to fuel you just
have to do a put so what happens in
WebDAV is if we get an authorization
error the daemon that supports the
system puts up a dialog box that says
who are you please type your user name
and password in and then we'll send that
across to the server we'll keep doing
that until the user either gets it right
it times out after about five minutes or
the user hits cancel if the user hits
cancel Annie access error comes back
from the filesystem likewise unlike AFP
or NFS which are typically run over
local area networks and are therefore
usually reasonably fast
maybe not compared to local Williams but
in the absolute sense WebDAV can be
quite slow
it may be running over 28k modem link
you never know because we're talking
about an Internet file system after all
so we're not going to go through all of
the items on this chart I just wanted to
scare you a little bit to give you an
idea of how some of these file systems
actually differ classic example hfs+
supports privileges it has storage in
the volume format for privileged
information supports the ability to get
an error back when you delete an open
file supports access by ID ms-dos
supports none of these things WebDAV
supports none of these things either NFS
is sort of a mix same story with naming
differences some file systems are
case-sensitive like u FS some are case
insensitive like h FS plus some support
unicode fully and some don't we make a
particular mention of the Unicode
characteristic hfs+ supports unicode
names in a canonical form where the
characters are decomposed it's possible
in unicode to have say an e with an
accent represented as u with an accent
as one character or e followed by an
accent character there aren't that many
decomposed characters but there are some
we always store all our filenames
decomposed on the volume so that there
will only be one element in a directory
that looks the same to the user and so
that will easily be able to do name
comparisons but not all file systems do
this ufs does not ufs does not interpret
the bits and we wind up storing things
on ufs as utf-8 characters and so on you
FS you actually could have two names
that look identical to the user but are
actually slightly different in their
byte representation what this means for
your application though is if you have a
composed character in a name that you
send to a create call when you look at
it again through a directory listing at
HFS+ it's going to be different and on
you FS will be precisely the same
so having said that let me give you some
tips as to how you can handle some of
the specific differences we've talked
about today in your application number
one be consistent in your use of case
internally we found with the Macintosh
Applications environment which was a Mac
environment that ran on UNIX operating
systems that lots of Mac apps would keel
over when ma II was running on a case
sensitive file system and that
characteristic bled through reason being
they would have files like preferences
that they would open in one part of
their app with a capital P and they
would open in another part of their app
with a small P and they wouldn't be the
same and that would confuse the app HFS+
they would be the same ufs they wouldn't
be you'd either see a different file or
you get it not exist error when you
tried to access the file but different
in case so make sure that in your app
when you're referencing a hard-coded
file if you do that that you're using
the same case in all instances always
use D composed names in your application
that way you'll never be surprised by a
name being slightly different on the way
back out than it was on the way into the
file system when you created it be
prepared for access errors at random
times as I mentioned the web dev file
system has a very odd permissions model
and we're gonna do things like put a
file across which may happen in an eff
sink or a flush files operation or on a
closed operation and we're gonna
discover at that time and not in advance
that the operation isn't permitted if
the user doesn't have a username and
password that allows them to do it or if
the server administrator has cut their
access your you can get back any access
error on a close call or on a flush
files call with something which probably
hasn't ever happened to you before so be
prepared for access errors at strange
times and you may want to be able to put
up a dialog that says access denied
you're going to probably see them if
you're running a Carbon app as an AFP
permissions error because that's how
Carbon maps are access denied error
codes also do not rely on inode numbers
and this is also true for file IDs
across mounts if file system is
unmounted and remounted all the inode
storage on a web dev volume all the file
ID stores on ufs volume is kept in
memory in big tables and we'd there's no
effort made to make that persistent
across different mounts we have one way
for you to help you deal with some of
these differences and that is the path
comps simple path
system call path comp is designed to
give you characteristic information
about the file system you're running on
it just takes a path and you give it a
selector that says you'd like to know if
the file system is case-sensitive or if
the file systems support how long the
names are that it supports those are the
only two selectors that we support right
now in HFS+ on path conf perhaps someday
we'll expand that list but you can use
this call especially in HFS+ to know
that you're on a case insensitive file
system so I'm going to now bring up Pat
Dirk's the core OS file systems
technical lead to talk to you about some
security issues and some performance
issues in Mac OS 10 thanks Ari well the
first thing to realize is that in Mac OS
10 it's a whole new world it is
multi-user to the core their permissions
everywhere and the core kernel enforces
those permissions
there's no path around it or some access
path that's gonna be different that's
not going to be affected by it the whole
system is fundamentally multi-user the
permissions in the system are the
standard UNIX permissions if you're
familiar with those the next few slides
are just going to be review for you hold
on there are a few gotchas in the
permission handling of HFS but if you
are familiar with UNIX you should be
very comfortable with the permissions
model on our system and we'll see that
diagram again in a moment the
permissions for those of you who may not
be familiar with UNIX are in some ways
similar to Apple share Apple shares
permission when we were designing the
AFP protocol are based on the UNIX
permissions model and we made a few
changes from there to allow them to work
on a on a folder only basis but they're
fundamentally inspired by the UNIX model
so instead of sea-fowl see folders and
make changes you have read write and
execute and you have that for files and
you have that for directories the catch
is it applies to files as well as
folders on Apple share file server the
only permissions you ever had to worry
about was the permissions you had on a
given folder in Mac OS 10 you have to
worry about the permissions on
individual files as well and the other
differences with AFP is that in AFP you
can have separate permissions for the
world the group that
a particular folder and the owner of the
folder and whichever category you fit in
you got those rights so everybody
started out with the everyone
permissions and then if you were part of
the group you also got the groups
Commission's and if you were the owner
of the object you also got the owners
permissions in Mac os10
there's only one group that is matched
to if you're the owner you get exactly
the owners permissions and if you're in
the group you get exactly the group's
permissions and if you're everyone else
then you get the other permissions so
those UNIX permissions consists of an
owner ID that is saved with the object a
group ID that is saved with the object a
set of permission bits which you see
there and a few extra bits that are not
divided up in separate categories for
owner group and other and we'll cover in
a moment what exactly those bits mean in
different cases but that's basically
read/write/execute
in three groups and three special bits
and some flags that we'll cover in a
moment as well so so there every user is
categorized in one of three possible
groups either the owner of the object
the group that there is associated with
the object or everybody else and
whichever group is most specific
determines the access as you get as I
said I can unlike Apple share so for
each group there is read write and
execute write translates very directly
to make make changes in AFP read is the
right to read a file or list the
contents of a directory it's a bit like
C file see folders that's where it gets
a little weird and execute applies most
directly to files obviously if it's an
executable and you have execute
permissions then you can execute it it
also applies to directories as we'll see
in a way which is kind of a subtle case
for files if you set me set UID bit then
when you execute that binary and it only
makes sense on executables the program
will run with the ID of the owner of
that file so you'll commonly hear set
UID root binaries those are files that
are placed on the system that will run
as a privileged user in the system when
they're executed and there's also said
GID which is used
less often which runs with the group
that is associated with the object now
in directories it's probably easier to
make sense of these permissions if you
think of directories as files listing
the contents of the directory because
that's how they were originally came
about read is explicitly the permission
to enumerate the contents of the
directory to type LS and list the
contents right is the right to make
changes it's very analogous to a FPS
make changes this is make changes to the
directory so these are operations that
would require changes in the directory
file creating a new file renaming a file
that's in there deleting a file that
sort of thing you can set execute on a
directory and that limits access a
little bit without read normally you
have read and execute together if you
give execute without read you retain the
ability to open files in there provided
you have permissions on the file itself
but you lose the ability to list the
directory contents so it's almost sort
of a security by obscurity if you know
what the file name is or your program
has built in you know some file that
it's referencing execute is enough to
get it open but you need to read in
order to enumerate the contents and look
at it so that's that's sort of an edge
case that you may run into and it's it's
unusual finally setting the sticky bit
one of the special bits that is
associated with an object means you can
give right but the ability to actually
make the changes is limited to the owner
of the object itself so it imposes one
additional test before you can actually
make use of the write permission that
you would otherwise look-look to be
granted now there are a special group of
flags associated with every object as
well and these exist only in one form
there's not a separate group of flags
for own your group and others there's
only one set of flags and most common
one you'll see is the immutable flag
that takes the place of the lock bit
that HFS always had and that is in fact
what
set F lock and reset F lock in the
carbon interfaces news to lock or unlock
a particular file you can see these
flags if you use the - o option in LS if
you find yourself in the shell it will
list you change if you if the immutable
bit is set so that's a quick way that
you can tell if things are locked and
there's a chi Flags command that changes
the flags and there's at your flag
system call that will manipulate them at
the BST level all these things are
accessible at the bsd level there's
nothing in carbon or cocoa or something
that is that a special above and beyond
this everything is enforced at the BSD
level and everything is accessible at
the BSD level the other gotcha you may
run into is that when something has been
marked immutable it can't be moved that
used to not be true on Mac OS you can
lock a file and still take it and move
it somewhere on servers that was
actually sort of an awkward thing to do
because you could lock something down
and somebody who had to make changes to
make it disappear from you that's no
longer the case when something is
immutable it doesn't go anywhere and it
doesn't change now these flags have the
immutable and the append Flags have
special variants of them that can be set
only if you are especially privileged
user and it can't be unset in the normal
running of the system so if you are
trying to protect some particularly
important file in the running of the
system you can set a special system only
immutable bit that is sort of stronger
even than the regular immutable bit and
that you can't turn it off so be careful
if you try this on your machine at home
you have to take the system down to
single user before you can clear that
bit know all the UNIX aficionados wake
up this is the part where things get
different again there is some special
handling on permissions for each of us
plus volumes we had a problem in that we
wanted people to be able to take discs
and move them all around from system to
system and retain the same ease of use
that they had in Mac OS 9 they could
take a zip disk from your system take it
over to somebody else's system and you
wouldn't suddenly find that the
permissions
were all wacky just because the numbers
that were assigned for the user ID and
group ID on your system made no sense on
the other system so the system very
carefully uses the permissions only on
those discs that it knows are local or
were specifically requested that they be
used by default if you have an H of s
plus disk that the system has never seen
and you connected either by plugging in
a zip disk or by plugging in a firewire
drive even the permissions will fall
back to a scheme where the owner and the
group are ignored and you can get that
same behavior on request for any system
for any disk in the system through the
finders ignore permission as a bit I'll
talk about that so every every disk is
identified not by name but by a special
64 bit identifier that we write on there
when a disk is being mounted the HFS
code checks to see if this ID is one of
a disc that it has seen before and for
whom permissions should be enabled and
if it finds that then it will enable the
permissions and it will be used just
like you would see a you of fastest
you'll see owners you'll see groups
you'll see everything if there is no
entry for that system if it's completely
unknown or if the entry in there says
the user asks that the permissions not
be used then the handling switches over
to ignore all the user and group IDs on
there make them unknown and replace the
owner with the login user and that's
done completely dynamically if you have
such a disconnected you log out somebody
else logs in they are now the owner of
all the objects in that system so it's
not a static mapping it's whoever is
currently logged in owns all that so
it's a very convenient way to not trip
over user or group settings that make no
sense on your system so the ignore
permissions checkbox in the finder lets
you elect to ignore this and get the
same sort of foreign disk behavior
that you get and it's the same
underlying mechanism what the ignore
permissions bit does is basically turn
off the recognition of that disk in the
system and it will treated without
regard for the users and groups
it's called ignore permissions really
the best way to think of it is to think
of it as ignore ownership we'll take
questions and answer questions on this
later I wanted to bring up a few points
about performance in the system and in
particular the different ways that you
can do I owe a few general words that
will touch in a moment but I always say
we want to cover the differences between
doing buffered filesystem i/o doing
direct memory mapped i/o in the system
and using unbuffered filesystem IO and
the differences between them and the
implications of those things I'll say a
few words about zero fill which your
application may have run into which is
something that you see on Mac OS 10 that
you never saw on Mac OS 9 before in
general this shouldn't be news to
anybody the few where iOS you do the
better the more you can aggregate Jerry
OS into a few large operations the
faster things will go even if you're
doing small transfers the system will
try to aggregate these on your behalf if
you're sequentially reading through a
file the system will pick up on that and
it will read larger and larger chunks
even ahead of where you currently are
and as you're writing it will it will
save up rights to do single large writes
out to the disk to maximize the
efficiency of your i/o so that is why
sequential operations are so much better
than random operations because even if
your application is only ever asking for
4k at a time you'll be doing very large
transfers to the disk and the zero fill
that we'll cover in a moment is
triggered when you are leaving areas of
the file unwritten but you do become the
owner of them and it's best to avoid
that because it's really just wasted
effort you might as well write the data
sequentially don't skip ahead of the end
of file for instance so this is basic
buffered file system i/o you see the
device driver in the system which is
doing the actual data transfer
the buffer cache and the virtual memory
system which are part of the kernel that
govern all the data in the system and in
Mac OS 10 those are actually integrated
and they coordinate with each other so
where there are cases where a particular
piece of a file or page in the system is
the same the buffer cache and the
virtual memory system coordinate access
to that page so there's only ever one
copy of the page that means if you have
something mapped and you write that
you'll see those changes in the mapping
right away and vice versa if you if you
if something gets paged out the
readwrite path will see that right away
it seems obvious in hindsight but it's
an awful lot of work to make that work
correctly and finally there's a user
application drawn there and you see the
user application with a page of memory
and there that's really the appearance
of a page that has managed on the user's
behalf in the virtual memory system you
can think of it either way you can think
of the page as owned by the user or you
can think of the page is managed by VM
on the user's behalf it's really all the
same thing
so in basic buffered i/o the data is
first copied from the device driver into
a buffer that is set aside by the file
system to hold blocks for that V node
and those blocks are shared between VM
and the buffer cache as necessary so
that's the first copy then the
filesystem will copy whatever the user
asked for to be copied either read or
written into the user application and
there's a second copy made into the user
pages that hold the user buffer so it's
completely flexible you can read at any
offset in the file you can read any
amount of data but you do end up making
two copies first a large page aligned
copy for the convenience of the system
into the buffer cache and then a
separate copy from there into your user
address space where the data really
lives and by the way the user page ends
up being dirtied as a result and we'll
see in a moment why that's important but
if you're going to be reading a file
over and over or you're reading back and
forth through a file it's a cost that's
well worth making in addition to the
flexibility you gain from the
ability to align the data or the size
anywhere you want the fact that the copy
remains in the buffer cache means the
next time you hit either in that same
page or somewhere right around there
you'll probably find it in memory and
you'll only end up doing the last copy
from the buffer cache into the user page
so there's an extra copy but it may be
worth it under some circumstances and
this is probably what most of you are
always here I always like this is
ordinary open closed readwrite i/o now
instead of that you can do memory mapped
i/o and it's something that you should
consider as an option when you're just
reading files it's a very efficient way
to get data in and are some advantages
although it requires a B as the VM call
to set it up so it may be tricky to do
from a CFM Carbon application it's a
very nice way to get the data in because
you only end up doing a single copy
essentially it goes straight from the
device driver into the VM system and
from there it's visible to you user
application so there's only one copy
mate so we save a copy in addition the
VM page is not marked dirty all that's
ever happened to that page is something
was read into it unlike the user page
that was copied into a moment ago so
when the system needs more pages it
doesn't have to go copying that user
page out to swap storage it's all set it
can just throw this away and I can read
it in later if it should get page
faulted in again so there are some
disadvantages every transfer is at least
a whole page worth so if you've got a
file with ten bytes of data in it that's
obviously not worth mapping it but it's
a nice way to get some data in and read
through a lot of data the VM system does
the same clustering of i/o operations
that I mentioned earlier as you're
touching through pages it will start
paging in more and more in advance so
it's a very good way to read in a
sizable data file that you're just going
to read sequentially it's not good for
right because you can't extend the file
by mapping it but it's a very good way
to read data in that you're only going
to read and that you're reading
sequentially and it'll save you a copy
now finally there's something that's
almost a mixture of the two
you can choose to do unbuffered IO and
it's actually very easy from carbon
because it's the exact I Oh pause mode
no cash bit that you can set on a reader
right transferring carbon and if carbon
does the work to the system on your
behalf if you're not using carbon you
can make s control calls to enable this
mode for your IO
but it basically skips the intervening
buffer cache altogether and the data
travels directly from the device driver
into your buffer now that imposes some
of the limitations that the file system
previously took on on your behalf on
your application so the data has to be
page aligned it has to be a multiple of
pages but if you're you know reading in
a QuickTime movie that you're going to
play once and never necessarily touch
again it would be a waste to fill up the
buffer cache with all those pages and
it's perfect to just read that
indirectly that way
you have total control over the amount
that is transferred in a single transfer
so if there's something about your data
that you know that the system wouldn't
know this is a very good way to do it if
you need to grab a whole frame of data
or for some reason know that 64 K is
exactly the right size to transfer and
you don't care until you have a full 64
K available this is the kind of i/o you
should consider unlike memory mapping
the page the page is dirtied just like
ordinary i/o would be the page in the
user space is marked dirty because it's
been copied into and it will be swapped
out if necessary but it's a good way to
do i/o and not fill up the buffer cache
if you're not likely to read or write to
read the data again it's a good thing to
do and you can write files this way so
if you're writing an output file that
your application isn't just about to
reread this may be a good way to do your
AO now the zero fill I mentioned the Mac
OS 10 kernel tries to be very careful
not to let you
read data that you haven't previously
written if you would call cases where
some major word processing application
would inadvertently shift pieces of your
hard disk out along with your documents
you'll see why this is a really nice
feature you have to be careful though
because if you have a file that you're
writing randomly you'll end up if the
first transfer is some distance into the
file you'll end up 0 filling the whole
intervening space basically anything
that you can potentially read you should
consider the you should either write
where the file system will write with
zeros on your behalf as part of the
write transfer so those cases are
basically where you you said EOF to make
the file larger or where you do a write
that skips ahead past the end of file
some distance and starts a transfer
there creating this gap that gap will be
0 filled so for those reasons
sequentially writing a file aside from
all the benefits I mentioned earlier of
clustering the i/o is is far preferable
no word about the cost of caching you
should be careful when you decide to
cache data in your application because
in Mac OS 10 you are constantly running
with virtual memory enabled and what you
think of as setting aside some memory
for this particular cache is really just
that much more paged memory in fact you
may end up doing a number of i/o
operations just to read the data and you
may have to page out some other dirty
page in the system to free up a page for
your cache you end up incurring the cost
of the actual transfer to read the page
in and if this day that turns out not to
be referenced you may end up having to
page out a page and you dirtied by this
cache in addition you have to be very
careful about how you structure the
cache this is not wire to memory that's
sitting there for your behalf if you
have a cache data structure that is just
laid out very conveniently in memory but
ends up skipping around from this page
to that sort of randomly you end up
touching all these pages and you may end
up doing page ins with every new page
that you
touch so you were very careful that you
structure your cash in a way that
minimizes the number of potential page
hits to get you to your data altogether
it's very easy for an application cache
to become much more expensive than
simply reading the data right back in
from disk especially if the data is
something that is mapped directly into
memory for instance so think about it
carefully and only use caches for things
that are truly hard to reconstruct or
where you are sure that the hit rate is
actually very very high
so finally Mac OS 10 is a good time to
rethink some of the fund elías options
behind your application think about the
kind of data that you're reading and the
pattern that you're reading or writing
the data in and think about what
mechanism you might best use to get that
IO in and out of the system look at your
application as it's happening and figure
out where the real bottlenecks are
before you decide where to spend your
time and effort and trickiness and what
to optimize if the bulk of your
application is reading and writing files
it's obviously worth thinking about if
the bulk of the time is spent waiting
for the user to click on some cell
somewhere or something that it may not
be an issue at all look at the
underlying assumptions that went into
your application because some of them
may well be changed in Mac OS 10 some
system calls that used to be almost free
on Mac OS 9 because they came straight
out of memory all the time may be
reasonably expensive on Mac OS 10 all of
a sudden and again that's the reason to
go back and look at your application in
action and see where the time is being
spent because you may be surprised to
find that you're spending a lot of time
doing things that you assumed would be
almost free and finally try to avoid
making assumptions about how fast
something will be to read because you
might be surprised what's actually
somewhere remote over on a network in
somebody's home directory and the
preference file you thought was cheap
actually turns out to be a very lengthy
operation that might involve
automatically mounting some volume
getting access to the data etc so don't
make assumptions about what's fast
what's local what's remote it could be
on a web dev volume
for all you know so finally there are
some tools that you should look at there
are some classic UNIX tools top is a
very nice tool for seeing the size of
your application the amount of virtual
memory that it has allocated to it how
much of that is shared how much of that
is private and it gives you a little
peek into the system and will show you
how fast paging IO is being done how
busy the system is what it's doing what
in your system is using the most CPU
time all kinds of things I recommend it
highly you should you should run it off
there's a time command which can be very
interesting it's limited to command-line
things but it will tell you how much
system time and how much user time was
spent executing this particular
application so along with the number of
i/os that were done on behalf of your
application so you can easily tell when
your application suddenly starts doing
fewer reads or fewer IO transfers or
more larger ones or smaller ones or
whether the percentage of system time
versus user time is interesting if the
system is spending most of its time in
system time you should think about what
system calls it's doing to cause that to
happen
and similarly don't worry too much if
most of the time is spent in the system
because your applications algorithms may
not be as relevant so time can be
interesting sample is a gather a long
standing next step tool it's it
dynamically probes your running
application and takes a peek at where
the system is currently running and the
stack at that time and you can tell it
to take a number of samples over a
certain period of time and it will tell
you what percentage of time was spent in
what routines and that may tell you
where the hot spots in your application
are tell you whether your application is
constantly waiting for IO to come off
disk or waiting for the user to do
something or all sorts of things so
sample is interesting and finally FS
usage which you may have seen demoed in
other sessions as well is a wonderful
tool for getting down to the real nitty
gritty of exactly what your application
is doing and what the system is doing
on behalf of your application you may be
making carbon calls and be unaware of
the number of system calls that go on
under the covers to make that carbon API
happen so at this point I like to bring
up our resident expert in bad demo code
my manager
Clark Warner Thank You Pat all right let
me pull down an explorer here and BBEdit
window all right
I'm gonna bring up a copy of TextEdit
which you've many of you imagine it
probably used by now and let me bring up
a copy of the process viewer
no I have to do this the
are we all right
okay first I'm going to do my little
UNIX command here to find out the pin
number of the process that is TextEdit
and it looks like 278 let me change the
font here to make this little bit more
readable for you that's probably better
okay
this is not a command we want you to run
too much at home I just made myself
route
what's that oh thank you okay now I'm
now monitoring all the behavior of
TextEdit and when I go back into whoops
let me bring it back you can see as I
click around various things are
happening one of the things I'll do is
open up a file that I put on our demo
volume you can see a lot of things are
happening now there's a 116 demos data
file okay so here's my opening of the
data file you'll notice there were a few
page ins of some open some F stats some
reads but basically one read call of a
fairly large size so that's not too bad
I'm gonna close up this file here let me
just show you
oops here we go again sorry
you think I would have woken up by now
the man page for SF FS usage one of the
most interesting things about the FS
usage program is the ability to see all
of the actual carbon file system calls
that are happening while the read calls
are happening if you notice here there's
this temp file tracing if you create in
slash temp this file called file tracing
then you will actually see all the
carbon calls as well as all of the bsd
calls that are coming through and to
show you that briefly I'll turn that on
and now I'm going to launch an
application that I call dumb text
dumb text is the standard simple text
text editor hacked up to do one byte
file reads just to give you an idea of
if if you guys wrote after this way this
is how you could figure it out before
your boss does let me talk you a little
bit about some of the key issues and
building your own file system one is
it's really not recommended and the
reason we say that it's not because
there's some you know fundamental thing
wrong with the with the api's or
anything like that but what the basic
issues are kernel extensions like in Mac
OS 9 have the ability to create kernel
instability and while I know all of you
right perfect apps that's not true for
the people that don't come here and so
we want you to bring the word back to
them don't try to throw stuff into our
kernel unless you absolutely positively
have to and if you do you're gonna have
to contact us building a file system
extension requires deep internal
knowledge of our kernel not just the VFS
stack but also the various calls you
might make to the kernel to make to use
kernel services in your system and they
change and we change as we change
internals in the kernel think there are
things in your file system that may have
to change as well and so basically if
you write a file system extension your
rev lock to our kernel if you went to
Deane rhesus talk you'll note he talked
about kernel versioning if you don't
version your kernel correctly in the
future we won't load your file system
extension if you don't version it at all
in the future we won't load your file
system extension so not only is there
the implicit rev lock because kernel
interfaces are changing internally and
you may be using those interfaces but
there's the explicit rev lock that if we
know we've changed those kernel
interfaces we will change the version of
the kernel what's more which ain't will
change the version of the kernel that
that kernel is compatible with and your
file system may not load I want to give
you one of these changes that was made
recently inside of our kernel something
that happened between public beta and
Mac OS 10 GM to give you an idea of what
kinds of things were doing we wanted Mac
OS tend to be a fully preemptable system
the Mach microkernel was already fully
preemptable but the bsd kernel was not
historically in bsd when you make a
system call you run all the way until
you reach a voluntary yield point that
is to say you do i oh you try to acquire
a lock allocate memory and so forth or
you'd run all the way to completion so
we invented a mecca
I'm called funnels to wrap all the BSD
code so that that assumption of non
preemption would would be held inside of
the code but otherwise bsd system calls
could be preempted funnels are required
when a thread enters a system call
they're released when the thread returns
to user mode they're also released when
a system call reaches a voluntary yield
point like IO allocating memory and so
forth but they're held across kernel
preemption so a bsd system call now can
be preempted in the kernel and something
else can run a user thread or a mock
thread or an i/o cue thread and so forth
but the bsd structures won't change out
from underneath the bsd kernel system
call so it's happy we also split the
funnel after we developed the first one
so that now networking operations in the
kernel are handled in the network funnel
and all other operations including file
system operations are handled in the
kernel funnel we found that we actually
could separate network activity in the
kernel from filesystem activity in the
kernel what this means though is if you
are writing a network file system say
every time you went to use the
networking infrastructure in the kernel
you'd have to change your funnel switch
from the kernel funnel to the network
funnel and when you went back you'd have
to switch back and switching funnels is
a blocking call and so the entire world
can change from out from under you when
you switch from the network funnel of
the kernel funnel and vice versa all
things you would have to know network
funnel is for things like socket IO and
find and accept calls and so forth
kernel funnel for everything else and
there are some calls of course that can
be called either from either funnel or
from no funnel memory allocation and
free etc so here are some need to knows
if you wanted to build a kernel
extension for Mac OS 10 one as we
mentioned last year we built this thing
we call the unified buffer cache which
if you had built the file system prior
to that would have had to change to
support it likewise between public beta
and now we introduced the split funnel
and of course we're going to be doing
things to improve the performance of our
kernel and the functionality of our
crenel on into the future and some of
those things are going to require
changes in the file system and if you
have one written you're going to have to
be inside of the loop you're going to
have to contact Apple there's other
stuff that may be involved but we can
only tell you so much in an hour so the
primary message is talk to Jason you
if you're thinking about building a file
system contact Apple you're gonna have
to be in the loop now we do a little
demonstration here because we like to
bring concepts home at the file system
session and so I am a rogue kernel file
system extension and my compatriots here
Pat Derek Scott Robertson you mish-mosh
and Pyun
are the kernel and this is me an
inappropriately version kernel file
system extension attempting to load I
[Laughter]
[Applause]
think you get the picture
here's some other sessions you may be
interested in at the show to help you
with building applications that are
filesystem centric or even building
filesystem extensions open source at
Apple is happening at 10:30 right after
this session in Hall a2 there's a
session on AFP server and the Apple
share client file system in Mac OS 10
that's happening tomorrow in route in
this room at 3:30 there's a carbon
performance tuning session happening and
halt to tomorrow at 2 o'clock and a
Apple performance tool session happening
in room a to Thursday at 5:00 where you
may get to look at your third demo of FS
usage we think FS usage is so important
that if you come to the world by
developers conference you should see it
at least twice possibly three times
likewise
leveraging bsd services will happen in
the Civic Auditorium Friday at 2 o'clock
the Darwin kernel presentation which
will give you an idea of how the kernel
is structured internally the mach kernel
and some of the bsd kernel services
outside of file systems and networking
that's going to be at the Civic Center
at 3:30 and the Darwin feedback forum
will be Friday at 5 o'clock that's all
we have for you today I'm gonna ask
Jason now to come up and he's going to
moderate our question-and-answer I'm
gonna bring Pat Dirk's Scott Roberts
emission fire champion and Don Brady up
on stage from the file systems in kernel
team and we'll take your questions
you