WWDC2004 Session 438
Transcript
Kind: captions
Language: en
I'm Dominic giampaolo a member of the
spotlight team and this talk is working
with spotlight first I'd like to go over
what we're going to cover today so the
agenda that we have is why spotlight how
it works what is it that we were trying
to solve what did what did we try and
accomplish then the next piece
integrating your app with spotlight what
does that mean what are the different
parts of that searching with spotlight
and of course working with metadata the
main focus of the talk is really going
to be about what it means to integrate
your app with spotlight first off what's
the problem any engineering task before
you start it you of course want to
define what is put in the Box define it
what is it that we're trying to solve
here it's hard to find things on your
computer I think we've all run into this
so why is it hard there's too many files
if you're anything like me you you know
accumulated a few files over the years
couple tens of thousands and then well
there's digital cameras oh that's
another couple thousand ten thousand
files and oh yeah all those mp3 files
that's another bunch of files any movies
that you've created or downloaded it
starts to accumulate if you bothered to
put things into some kind of
organization that's really nice you have
like I'm a little bit a bit of a
librarian inocencia a nice broad
hierarchy but everything's fixed into a
single location and that's not always
what you want you may have files that
fit into multiple categories I have just
got back from a trip and I took lots of
pictures of flowers and other pictures
of mountains but in different countries
if i want to say show me all the flower
pictures we took in france well that's
one thing but if i want to say just show
me all the flower pictures from the trip
including France and Italy and wherever
else can't do it it's not an easy way to
organize things next there's also a lot
of rich information about files and
we're just not using it so even though
mp3 files are tagged with a lot of rich
metadata email has quite a bit of
metadata JPEG images of course the EXIF
information from the camera is quite a
bit you may want to say show me
everything with a less than stop at
three point 0 / thing
each shot as clothes off or long
distance but there's no way to easily
search for that so there's just no easy
way to find this information what's the
spotlight solution obviously we want to
make it fast and easy to find files as
opposed of course to making it confusing
and difficult which I suppose if that
was the goal were already done so we
wanted how do we want to do this by
using metadata to enable richer searches
metadata is information about the data
or the contents of a file we want to use
that to enable things in able users to
search in a more natural way this allows
you to organize files in multiple ways
so that you can sit like I was giving an
example before mountains in france or
french pictures there are pictures i
took in france these are different you
know axes that you can flip around we'd
also like to allow for additional
metadata things that maybe weren't
originally envisioned as being
associated with the file but that you
need to such as workflow state and the
last point is a very key one we don't
want to require apps to change it would
be great if we could you know wave our
hands tada new world everyone's
rewritten all their apps great and we
all work with metadata and we're all
very happy but the world doesn't work
that way you have lots of code it's
difficult to change so the minimal
amount that we can ask you to do the
better now I'm going to go through a
quick demo of spotlight and to cover a
couple of things that that I'd like to
highlight for you first of course we'll
start with the finder so some of this we
saw in the keynote but there's a couple
of subtleties that I wanted to point out
so first query type of course HTML we
find 779 items out of whatever on this
disk and that's pretty fast if i was to
type something like jpeg jpg like this
we find eleven hundred and eighty three
which is a little bit bigger a few
people missed this in the keynote or at
least i heard but there are smart
folders so if i type jpeg down here now
i have a Smart Folder of JPEG images I
have another one of HTML as well so you
can save your searches and come back to
them and of course they re executed at
that time
another thing if I was to type something
like Frederick okay nothing matches so I
come over here just position these
windows appropriately and I go and
create myself a new folder but i'm not
going to type Frederick normally I'm
going to do it the French way because
there's so many wonderful French people
at Apple so we type it with a few extra
accent there and when I hit return
notice that it showed up automatically
in this query over here so even though I
type the ease without the accent doing
case and diacritic and sensitivity on
the matches and you saw that it was live
as well it showed up when I created it
and if i was to drag that to the trash
it disappears from the query next I'm
going to show you a little application
will not a little but a very nice
application that we wrote internally
called bull search to demonstrate
another feature that we have called
grouping so if I was to search for jpg
again we find a bunch of items here and
I'm going to organize that there's an
option for grouping and so if i choose
to organize these by title some of these
things were actually QuickTime movies
that were compressed with jpeg completo
jpeg compression so now you see there's
kind of these set of virtual folders
that were automatically created based on
the title so there were five different
versions of the dungeons and dragons
trailer and they showed that they get
grouped together because they all have
the same title so it's sort of synthetic
virtual folders that get created on the
fly based on the set of attributes that
you're grouping by and this is a very
powerful way to kind of build virtual
hierarchies on the fly so finding nemo
again there's only three versions of
that film these are different
resolutions or bit rates for for the web
or so on which is this is kind of a nice
way to organize things now the key part
of this list emotionally just click that
is that I'm going to run Microsoft Word
fire that up here for a second and I'm
going to bring up a
another Finder window i'm going to type
the word outrageous nothing matches the
bill word outrageous now in word and
regis document baby so i've just created
this and if i save it and i'm going to
call it whatever because if it had the
word outrageous and the title that would
be too easy and so of course it
automatically matched in the content and
showed up now he keeping to observe we
didn't change work right we don't have
access this word so we didn't actually
do anything they're fine by content
that's pretty straightforward another
thing though that it's a little bit more
subtle if you pull up the property sheet
and this was alluded to invert Ron's
demo you can see but there's some
there's some metadata here that was
automatically filled in for me both in
the title and the author field so just
click OK there if I was to come back
over here and I'm just going to type
something else that matches nothing so
you can see there's nothing in the query
so I set my last name it matches the
document because we extracted that
metadata and again this is the role of
importers which is key thing that we're
going to talk about here in a second and
how they this is how you get your app
integrated into spotlight so without any
changes to to Microsoft Word whatsoever
the simple addition of this importer
what we've managed to get it integrated
very seamlessly so you can search for
things by their author so on and so
forth ok so let's quit out of here clean
this up I don't bother that and going
back to the slides weekend for a second
why do you care spotlight and riches the
user experience plain and simple it
makes your documents easier to find when
you're integrated properly by the
presence of an importer your documents
users can find them based on things that
they remember about them not just the
title so that can take that can take the
many different forms some of which we're
going to go over it doesn't require any
code changes in your app there are
things that you can do to take
additional advantage of spotlight if you
want but without doing anything at all
for example like we did with Microsoft
Word you can take advantage of you can
get integrated into the spotlight system
users can find their documents more
easily kind of gives it it's like an
additional feature for almost no work
whatsoever and it's another way to share
data with applications so that
applications don't have to necessarily
know everybody else's file format for
the information the metadata that's
important which can be published by the
importer then it's more easily
accessible to other applications they
don't have to go through and parse your
file format there's a uniform way to
access it now we're going to talk a
little bit about the spotlight
architecture so you can understand kind
of how its put together and where you
fit into the into the equation spotlight
is a system for storing and retrieving
and clearing and the tree getting out
information about files it's composed of
a server which runs in the background
Damon's that help the server and of
course importers and I should not forget
to mention the client API which is part
of course services the importers are the
sort of connection from the rest of the
world to the system that stores it kind
of what does it look like let's get this
tour so over here on the left side you
have an application which goes and
writes a file when that file is written
the system notice is this and an
importer is run to extract metadata from
that file which is then connected up to
the spotlight server which stores it
into the system store on the right hand
side of the picture you have what we
find your icon which you know could be
any application we
issues queries and receives results and
can display those not a lot of apps have
a need for that but for those that do
that's the sort of final piece of it
there's three main concepts in the
spotlight system of course you have
importers which as mentioned here using
the actual code terminology md importer
which is how you extract metadata from
the file publish it to the system so
it's pretty straightforward and we're
going to write one later on in the in
the talk in a minute here you have an MD
query which is a way to express a bit
writing expression about the attributes
that you want to find the files that you
want to find and retrieve them and then
the leaf items rmd items which represent
files and items are made up of
attributes and I use the word metadata
and actually sort of interchangeably
attributes are named types and a value
and represents some information about
the file ways to integrate with with the
spotlight you can write an importer if
you have a custom file format so if you
work with standard file formats such as
JPEG or aiff or mp3 don't have to do
anything we've already we're going to
cover the basic data types that Apple
supports natively so there's no work to
be done if you work with standard file
format with some caveats in the sense
that you want to put metadata in there
if you can but the first thing you can
do if you have a custom file format is
to write an importer this is a this is
what enables sophisticated searches for
your documents you want to put useful
metadata in your documents so that's
sort of what I was saying is that if you
can for example the exif data that comes
in a camera preserve it make sure it
stays in there or put additional
information in there that we can extract
because a lot of file formats already
have support for a variety of metadata
and then if you need you can the final
level of integration is to actually use
spotlight queries for tracking documents
or displaying results now we're going to
switch to talking about importers and in
this section of the talk we will
actually go through and create one and
write it install it and show you how it
works what are the rules of the game
importers need to publish metadata that
helps users
search kind of harping on this it's what
you want to allow for Richard previews
which is in the sense that some
attributes are difficult to compute so
the length of a song you have a variable
bitrate file computing how long it takes
you know what's the duration is
difficult so you would want to compute
that once and store it as an attribute
so that we can say Oh find many songs
that are longer than three minutes
that's useful you want to avoid putting
things into into publishing metadata
that private data binary data icon
previews this is not what what spotlight
is about spotlight isn't a sort of fast
and efficient way to search for
user-oriented data things that users
remember the labels of layers in a
Photoshop document the names of tracks
in a multitrack audio editor or Movie
Editor these are things that people
would remember that they would want to
search for a you know chunk of some data
structure that's internal to your apps
binary that the user has no connection
to no that's not that's not something
that they want to search for and at the
other end of the game a spectrum too
much noise too much too many attributes
can confuse the user if you have 500
attributes that's probably not the right
approach so what are attributes examples
of good attributes copyright title
author dimensions there's a special
attribute called KMD item text content
which we use to represent the text
content of a document and this is how we
do the full text searches so that can
take a couple of different forms and
I'll cover that in a couple of minutes
some bad attributes would be you know a
specific implementation details or
binary data that the user can't easily
search on we've predefined a whole bunch
of attributes so you can see the list
here KMD item title authors keywords
projects and so on there's quite a quite
a different quite extensive list hasn't
covered everything of course but fairly
broad set of things and if you look in
the include file metadata md items hoc
different
you will see the full list of these of
these attributes so writing an importer
this is where we're going to actually
step through the process of writing one
what do you have to do do it in Xcode
we've got a metadata importer template
there's one function to implement so
it's not that difficult you can use your
existing document reading code with the
caveat that you don't want to have
something some piece of code that goes
into inflates the entire data structure
you have some multi megabyte data image
or whatever and it gets pulled into
memory and exploded and decompressed
that's not what you would want to do you
would want to sort of scrape the file
get the interesting bits as metadata and
then publish that so a lightweight
version of your document reading code he
does if you have us custom file format
you know how to read and write it so you
probably have code that you can use and
you return a CS dictionary of the
attributes that you would like to
publish for that for that document those
are sort of at a high level what it
takes to write an importer there's three
steps using the MV importer template you
have to create and define a good edit
the info.plist and then implement the
code one two three defining the GUI you
there's a command line tool called uuid
Jenny type that in in the terminal run
it you get a string put that into the
code edit the info.plist to associate
that good with the with the code and
then identify the UTI types that your
plugin handled so if you have a custom
file format with a custom file type you
would put the file type for that
document into the lsi item content types
key and the info.plist and that's how
your that's how the system knows to
associate your importer with that data
type and we're going to go through a
code and then of course you have to
implement the code there's a function
get metadata for file and that's that so
let's write an importer and you'll see
this doesn't actually take to too much
Oh run Xcode
oh can we switch over to the code
machine great okay we'll create a new
project and this is a Apple standard
plug in we have metadata importer
predefined and we'll call this source
importer although that's just because
that's what I've been typing it's not
actually a source importer well it's
like this let's say so if we pull up
main doc say you can take a quick look
here we have a template and there is the
three steps that I talked about first
thing it says is create a unique uuid
for your importer so fire up terminal I
type uuid jen and i get this very
beautiful 128 bit ID and I will push
that down there and I paste that in
there okay and following the
instructions go to step 2 edit the
info.plist alright i can do that and i
come back over here and there is the
metadata importer plugin ID and i will
paste in there and once again this other
part down here and then the last thing
that it said to do was to change the UTI
text so in this case I'm going to say is
that we edit public dash C or we support
public dash B header files save that and
now the third step is to write the code
so of course I'm going to sit down and
write a big chunk of code right now
right it parses see header files and no
we're just going to cut and paste a
little bit there's a couple of header
files that I need to throw in here I'll
put those up here at the top and I will
put in a i will put in a
prototype for my function which I wrote
the head of time and I'm just going to
cut and paste a nice big chunk of code
down here below that is very rudimentary
but there's enough work to to make this
demo work to partially header file now
where the last thing said implement get
metadata for file so here we have that
piece of code that's empty at the moment
your past in a couple of different
arguments and the main one of course is
the attributes dictionary ref that you
get this is what we're going to fill out
with the information that we would like
to publish for this file we're also told
the content types UTI so if you have an
importer two handles multiple types
you'll know what we think the type of
the file is and the last thing most
important is a pat of reference to the
file the path to the file that we would
like you to parse now I've already gone
and filled in the bit of code that does
all this I'm just going to cut and paste
this into here replacing this empty bit
and then we'll go through it really
briefly and so this we get the full path
and then we have this function get
typedef names which given a path returns
to us the number of type deaths in a.c
array and then does the the magic to
convert that into a CF array which we
then add as a dictionary value with a
particular this is the attribute name
Tom Apple source typedef there's one
thing I have to do here because they
have a custom attribute name and we
passed in the CF type depth which is a
CF array so I can save this now the last
thing that I have to do because we're we
have a custom a custom
he ever got a custom attribute name we
have to define it and so we do one other
thing I want to
so I need to call this calm copy this
paste that in there and then this is
where we actually really say what it is
and I'll talk about this again in a
second so I'm just kind of going to
gloss over this for the moment that's
all taken care of save that we've saved
this and we're going to build it and if
I didn't screw anything up okay good job
it's built now what do we do with it so
we have go into the source and Porter
directory it's going to build the
rectory Thursday source importer md
importer by copied echar source and
Porter are filled source important mb
importer in two tildes flash library mb
importers and i'm just putting it my
home directory here i'll talk about
where else you can install it it's
putting it here for the second and we
drop it in there basically that's all it
took to install it now we can run
developer tools there's a program called
MV import we do dash out everything we
did we did everything properly it should
show up in the left which it did not ok
so i need to md check schema know
check fema and that is on schema.org
smell successfully parsed oh that's
right thank you yeah so clearly i've
used unix for only about six months and
of course being up on stage helps a lot
whoever said that I really appreciate it
I would have done another 10 minutes
realizing that okay so now we will
successfully install it properly and if
we run developer tools mb import ah
there we go beautiful yeah okay now on
the desktop i had a place i had a sample
header file in this test directory if i
run it developer tools and the import
again and with the dash d3 option so it
will print out loads of information i
have this file my header dot h now first
off if i type but let me just go ahead
and run it what we can see happens here
is it says importing data from file and
it tells me exactly what file what type
it thinks it is publix be header which
is useful to see that it matches what we
defined ourselves as and then we can see
that hey calm apple source type stuff
that's the name that we define for our
attributes and there's three typedef my
integer my big into here and crew struct
and if we were to look at my header dot
h you can see that there are three types
deaths in here that were extracted
properly and published as header file so
this is a way that you know we've just
defined a new importer and installed it
in the system and successfully had it
published metadata which if we'd like to
go into finder and if i say what file
defines my big integer we see that my
header dot h shows up so
fully plugged into the chip put out of
extra code to the moment and kindle that
okay we can go back to the slide so we
wrote an importer there's a couple of
things we still need to talk about
though well md importers run in several
different context so in the case that i
showed their we're an MD MD import by
hand on a single file it ran extracted
the metadata that's all very nice well
and good however md import can run in a
couple of different scenarios for
example if someone takes and plugs in a
firewire hard drive with a hundred
thousand files on it we've got a lot of
work to do and so it can be part of a
slightly longer running process this of
course has implications but when you run
it once and it works you're all happy
that's great and you don't notice
anything necessarily because even if it
goes and allocates a lot of memory and
you may not feel the impact however when
it's running excuse me as part of a
longer load process if it has leaks or
trashes memory you're going to start to
notice these things so you need to be a
good citizen you need to pay attention
to this we're also taking defensive
measures as well so if you're not a good
citizen will make you be a good however
you want to avoid using a lot of memory
as you can you know you have to pay
attention to things like leaks and we
have a lot of tools to do this and you
want to use some caution when reading
large files so in some cases like I said
when you plug in a drive with a whole
bunch of files on it you don't want to
necessarily just read the file like you
would normally because you can pollute
the flout the buffer cache of the
computer which can cause a lot of
unnecessary paging activity because in
that scenario data is not likely to be
used again so if you're running with
standard POSIX file descriptors you can
call SF control for s no cash and if the
data is in the cache because it was a
recently saved document you'll get it
from there if it's not in the cache you
won't waste time polluting the cash with
data that you're never going to read
again so that's always a win for using
carbon you can use the no cash map if
you're using cocoa you can get at the
raw file descriptor and call the f
control again tips for importers
you want to use standard after Bruton
aims avoid inventing new ones if you can
we have a lot of common one so for you
to say well my document file format has
an author field so I'm going to call it
calm my company at authors is not the
right thing use KMD item authors take a
look at the list that I mentioned
earlier see what's there and make use of
them when you can don't forget text
content when it's applicable this can
take a lot of different forms for
example if you had a keynote
presentation importer well what does
text content mean for a presentation
well there's all the bullet item fault
for all example all of this text right
here is something that could be
published as part of the KND item text
content you want to avoid letting that
get too big there's not much point in
storing more than about 100k of texts
since that's what Google does and seems
to work for them so you know you can
accumulate text from various pieces of
your document it may not just be
straight big chunks of text that may be
strings that come from a variety of
places and publish it that way you will
don't want to publish too much it
doesn't it's not helpful to publish 500
attributes as I mentioned earlier
publish things that the user thinks
about interacts with and that will make
it easier for them to refine things if
you need to remove an attribute for
example you decided that other
attributes who no longer applies to this
document add it to the dictionary and
that put a CF null for the value and
that will cause the system to delete it
as you saw i installed it into till the
library md importers which works pretty
well for you know initial testing and
debugging but most likely you would want
to install your employer into flash
library in md importers to debugging for
debugging like i used the md import
shell which is the list of what
importers are installed that's the quick
test to see that it got there which
where I had that heart attack earlier
and then when you're testing it to see
what's happening you can use MV import
with the dash D option to get different
levels of debugging v4 is probably way
too much you can give it a path to a
hierarchy of files or you can give it a
specific file as well and there's some
it's in developer tools and it's a you
know the way you would test things out
if you define if you need to define new
attributes and this is what I kind of
glossed over in the in the Code
walkthrough there's a schema dot XML
file that's part of the project and you
can define new attributes in a couple of
different ways so depending on what your
needs are the first one I can't find
serio we have a string attribute which
we define as types of CF string and we
give it a name you have number GF number
and the last one is kind of an
interesting one and this is what I used
in the source importer the multivalued
string what this means is you can think
of the document as are the attribute as
an array of individual values so who bar
blah those are all separate entities
that are in an array for that attribute
then you would localize or you can
provide localization for your attribute
with the schema dot strings file just
again a standard convention at the
utf-16 file and you map what you wanted
to call or what they name you gave it in
the file which is not something you
would display to the user and then in my
favorite language the only other
language I know Italian what you would
want to display it as and you can check
this with md check schema you notice
that we're using a kind of funky naming
convention here where the reverse dns
style naming but we have under bars
instead of periods that's because we
wanted to keep these attribute names
compatible with the coco key value
coding scheme which doesn't allow for
periods in the name so that's why we did
it that way Apple has written a whole
bunch of a whole bunch a couple of
importers for the standard file formats
that we support natively and you can
expect us to continue to do that so
things like JPEG PNG tips so on we have
that covered quicktime of course you
would expect that PDF and then things
that the application kit can open for
text documents which includes RTF and
RTF d and word documents and we support
that as well so you don't have to do
those so in summary importers are pretty
simple to write there's not a lot to it
you know if there's a bit of glue code
that you have to get together we provide
that in the template
two CS plug-in so there's you know it's
not any great magic it makes your
documents easier to find this is the
connection this whole system the whole
spotlight system lives and dies by the
quality of the metadata that's there and
how easy it is for users to search for
things so it makes your documents easier
to find it's in everybody's best
interest to do it it handles full-text
indexing with the KMD item text content
attributes and it's the sort of thing
you could go home and write one tonight
so with that let's talk about queries
and searching who needs queries well not
a lot of people actually there's not
that many finder applications that need
to be written but apps to have a custom
UI where the focus is working with
groups of files so you know take some of
these things and do some of this stuff
over here and that's the main focus here
at you I where you're not going through
a traditional open phase panel those are
the kinds of things that would would
benefit from working with queries
through asset management workflow or you
know filetype management applications
even something like soundtrack which
doesn't you may not think of is working
with files but in fact when you I don't
know if you're familiar with the
soundtrack application but it lets you
select different sets of instruments
issues queries to do this this is
something actually that could take
advantage of the spotlight system to do
queries on the attributes about the
instruments that it's searching for the
queries find items based on their
attributes attributes that you can
search on our the metadata that's
published by the importers of course
file system attributes that's what we've
always been able to search on the file
size last modification time all those
boring things that you know you don't
really always think about but are useful
sometimes and of course full text
content what does the query language
look like it's a simple see like
expression with standard operators like
equals not equals greater than what you
would expect you know parentheses for
grouping so what does it look like in an
actual expression they have two of them
there KMD item keywords equal star to
star and that's how you would do
substring match and in the bottom
example is slightly more complex when I
did the example searching for frederique
in the finder before
you notice that it matched because even
though the accented characters accented
e and I hadn't typed in accented e still
matched and what you see at the end here
is what I went too far okay that little
CD at the end stands for case
insensitive and diacritic insensitive
and because we have the asterisks around
both ends of it it's a case and I critic
insensitive substring match now how do
you write a query there's three parts to
it really you first create an MD query
ref then you have the standard CFO
allocator default and the string you
pass in is the expression that we had
just on the on the previous screen in
this case we're saying KMD item title
equals star Tigerstar and we have the CD
and then we have some additional options
for grouping which is what I'm sorting
which I showed in both search but we're
not going to cover here today we'll test
an old for those let me start the query
running with mb query execute and in
this case we've specified that we want
to have want updates which is a live
query if you just want to issue a
one-shot query past zero i believe for
that argument and then you don't get any
updates just that's the result in the
story then you read the results when you
get notifications that there's results
available you get the result at query
index I and there you have it queries
are designed to work with CF run loops
so there's three phases really there's
progress okay you're getting results
things are coming in from the initial
set we're going through then you get a
finish notification that says okay
that's the initial set if you selected
for live queries then you'll start to
get updates as things come and go from
the query set know when you saw like the
liveness things i did a clearing nothing
match then something popped in that's an
update notification coming in saying hey
there's a new result you have like i
mentioned one shot or live queries and
the sorting and grouping features which
again unfortunately we're not going to
cover today because we will not have
time so I can come back over here we're
going to go through
a little sample program that we have
that does queries I'm not going to write
the code but we will go through it
briefly to kind of see what it looks
like i have a nickel i should just pull
it up okay i have a little application
that looks like this guy right here and
there's a search field which is hooked
up to the code to a search now so when i
type in a string that gets stuff plugged
into i didn't actually change changes
plugged into this function here search
now in the code go down a little bit
first thing we do is set the title of
the window not very interesting we
created NS string and this is a coke
application and we do a star equals and
then we put the string that they type
that's the % @ and it was typed into the
search field and we put it in quotes and
put it as a substring match with the
stars on either end and we say cd4 case
and diacritic insensitive then we pass
that on to start query which is another
method down below and that's right here
and here we take then we add
notifications if we have the very first
query that we've run we add some
notification observers for progress
finish and update and then we call md
query execute just like i mentioned on
the slides earlier now when when we run
this program go ahead and build it and
run it here's what it looks like and if
i type HTML get the same results we'd
get in any of the other applications we
have 779 results alright pretty
straightforward where did that all come
from well when we got updates we asked
the tableview to reload its data and i'm
going to talk about that how we actually
display the data later on in the second
blast it of this this talk and when we
get the done notifications we don't have
to
we just note that it's done and that's
basically all there is to it to issuing
a query that you when you get updates
you tell yourself to process them and in
this case like I said we just ask the
tableview to reload the data which is
where we actually go and display it and
that's that's that okay so so actually
the Holies I don't think that back cuz
i'm going to you next if we can go back
to the slides actually so really few
apps need to perform queries it's you
know if you need to do it it's not that
hard but it's not the sort of thing that
you have to think what do i have to do
to adopt spotlight i need to i need to
do this not not everybody needs to it's
great if you do it's not very hard but
it's if it's appropriate for your
application queries are see like
expressions about the attributes that
you want to search so you saw we had
some very simple expressions with
standard equals you can build much more
complex ones with parentheses for
grouping so you can do ores and and and
so on to build you know date is modified
between the state and it's less than
this other day or it's in this other
date range when you can build some
fairly sophisticated things if you'd
like it's well integrated with CF run
loops it would be kind of a pain if this
was bolted on the side you had to jump
through hoops and do contortions to make
it work with your application but you
know we've worked very closely with the
finder team to meet their needs and
other applications like the bowl search
demo or that ask Mac demo so we kind of
understand how it should be integrated
properly there's options for doing live
queries so if you want to continue to
receive updates and notifications on the
fly we support that and as I mentioned
or alluded to and sort of demoed with
full search they're sorting and grouping
features which can provide you with some
pretty advanced functionality if you
require it now displaying metadata as
you saw when I ran a smack it displayed
some information about the files and
that it was displaying the name but it
got back through the spotlight system
there's pretty it's pretty
straightforward to display metadata you
have to have an item reference you can
get an item reference in one of two ways
you can either first get it as a result
from a query or you can create it for
explicit pass so if you know the pass
through some other mechanism it was
something returned to you via a file
open space panel or what have you you
can just explicitly create the item once
you have the item reference then you can
get a list of attribute names about the
item that the dictionary and array I'm
sorry of the names that of attributes
that exist for that item so if you don't
know anything about it and you want to
display arbitrarily what's there you can
get that list and then go through and
get the actual values or if you know
exactly what you want you can use the md
item coffee attribute family of calls
and I say as a family because there are
different variants depending on what you
want to get one a few or all the
attributes for an item and then you can
use that information and we all work
with standard CF types so if you have a
multivalued string you'll get back a CF
array with all of the values for that
attribute name so in the case of going
back to the source importer if I asked
for k if i call md item coffee attribute
for this specific attribute com apple
source typedef i would get back a CF
array that contains the names or the
values for that for that attribute name
of course if the attribute doesn't exist
you'll get a null so you need to be be
aware of that you want to use the the
one that's most appropriate of course
bulkier calls are better in the sense
that if you're going to get five
attributes and you know you're always
going to get five attributes pat build
the CF array with those five attribute
names and get all five of them at once
sort of standard best practices that way
you avoid round trips back and forth to
the server so let's show you how we
display metadata back in the ask Mac
application
so going back down to that mysterious
function that i allude to about
reloading the data so here we have the
table view object for table column and
in this case you see we come in and we
take the identity identity field of the
of the column that was passed in and we
look at what it is so in this case I'm
just calling KMD item copy attribute as
you can all see that for the KMD item
pass for that file and that's the that's
the information that I will return that
should be displayed for that column i
also have if there's a display name
we'll take that and i also have a second
column KMD item title now if i go ahead
and run this again if I searched for jpg
you're going to see that some of these
things that match our as I mentioned
before movies that are compressed with
JPEG compression photo jpeg compression
so they actually have titles we're as a
normal jpg file wouldn't necessarily
have a title so in the cases where there
is no title we don't display anything
it's just empty in the case where is the
title we've successfully retrieved it
with KMD item copy attribute watches
this line right here to get the title
for the object so when it exists we get
it when it doesn't exist we don't
display anything and that's all there is
to it really for displaying data now you
of course can do more sophisticated
things caching data if you need to and
so on or as I alluded to calling a more
sophisticated function to get a set of
attributes all at once okay if we can go
back to the slide
can we go back to sign Celtics okay so
in summary items consisted consists of a
list of attributes items are the
representation of a file in the
spotlight system and it's a list of
attributes which are attributes are
named types and value you can get a list
of all the attributes for an item so if
you know nothing at all about it you
want to find out all everything that's
there you can get the full list of names
and then you can go through and retrieve
the actual values for each one you can
call there's calls in the MD item copy
attributes family for one some or all of
the attributes that are associated with
a file and you want to use the bulk
calls when possible one item that I
should or one attribute that I should
mention is it a special as KMD item text
content you can't retrieve that in the
sense of give me the text content for
this document that it doesn't work that
way just so that you're aware of it no
now we've talked a bit about the high
low Earth of CD API and there's also a
cocoa API that I'd like to mention
although we're not going to go into any
code samples for it as you might expect
the lower level core services API is
very straightforward and procedural
there's the cocoa API NS which is based
on NS metadata query and as expected is
higher level object oriented manages
queries and results use the NS predicate
class which should have in blue but
anyway to populate or to initialize an
NS metadata query that's metadata and
this predicate is an expression about
the attributes that you want to find and
that's how you would build the
expression as opposed to just using a
straightforward string and as metadata
query also offers a group of grouping
feature and its key value coding and
observing compatible so you can hook
things up to NS array and NS tree
controllers sort of for automatic kind
of connections between queries their
results and their display
another factor that I'd like to talk
about a full-text indexing have
mentioned it a few times throughout the
presentation that spotlight uses
full-text indexing the search kid has
undergone some dramatic improvements for
for Tiger in content indexing is
considerably faster incremental search
which is something that wasn't really
doable before is up to 20 x faster so
you don't have to wait for all the
results to be relevant strength before
you get results we can start getting
results on the fly which kind of gives
you that find as you type functionality
and when you do want relevance ranking
they've improved the relevance ranking
quite a bit now why am i mentioning this
in some cases it's appropriate to use a
search kit directly for your for your
own private index such as the help
content or the Xcode documentation these
are things that are sort of more
appropriate to private or through a
specific index and the search kid api's
which were made public in Panther and
have been enhanced in Tiger are there
for you to use and fully documented
what's the current state of things so
obviously this is not a final release so
we're not done yet there's going to be
issues and things that you'll run into
there's some limits on attributes size
that we sort of kind of self-imposed
we're sort of proceeding very cautiously
with this whole project because this is
sort of new territory in a lot of ways I
mean I know some some of these things
have existed before but we don't want to
put ourselves into a situation where we
we wind up with something that's not
sustainable in the future so we're kind
of defining a fairly tight envelope and
then where we bumped into it we look at
it well why did we bump into that that
limit there is this the right place to
expand things to push the boundary and
when appropriate yeah we'll push it so
like I said there are some limits on the
attribute size and number of attributes
when you run into these talk to us let
us know what it is that you're trying to
do why is it not working sometimes it's
the right thing to increase the limits
and sometimes it's like no maybe that is
an indication that things are being
should be done in a different way when
your feedback like I said this is kind
of new territory for it to be in such a
broadly available general purpose
operating system you know these kind of
metadata functionality and so on so we
want to hear what people are looking for
what they need what they're missing what
doesn't work for them with what we have
today so that we can build the system
better expand the system to meet those
needs so summarizing what we've talked
about today importers of the main
connection from your application file
format to the spotlight system so
importers publish metadata from files
spotlight makes it takes that metadata
makes you the documents easier to find
and displays them more and allow them to
be displayed more richly spotlight also
allows applications to interact in more
sophisticated ways so as I mentioned
before you don't have any patients that
have to know about em other applications
and all their file formats they can just
sort of asked for the attributes about
the file they don't have to bother going
to parse it because the data has been
published what do you need to do what is
the end result of this if you have a
custom file format right and employer
that's the biggest thing that's the
biggest favor you can do for your users
for us and for yourself put useful
metadata in your document so a lot of
file formats already have support for
various types of metadata like I showed
with word there's that property sheet
make sure to populate that where you can
with things that are interesting things
that would help the user find that
document later on and when you're doing
things if you're doing special things
manipulating a document and copying it
or doing the save as preserve the
metadata where possible or when
appropriate so if there's a next step
chunk in a JPEG file and it makes sense
and you haven't completely modified the
documents or no longer makes sense
preserve that coffee it as part of the
file format no so now where you can find
out more about this because I've gone
through this pretty quickly and you know
it's not like you're going to Sara Lee
have everything in your head right now
there's a whole bunch of example code
and documentation online as well as some
updates to what's online and that disk
image is
connectable calm ok so there's an
additional disc image of documentation
on connectable calm so here we have the
different spotlight the spotlight
importers where you would find that
documentation the MD imported reference
and you know the template that's they're
pretty much describes it all so there's
not too too much that you have to worry
about that the right place md item to
find out you know how you would
manipulate that what the functions of
that class are not class with that what
the family of calls are how you would
make use of them query reference schemas
and so on and adding search to your
application so there's quite a bit of
documentation already available and out
there