Home > Query Language
Query Language
You can search for any word or phrase on a Web site by just typing the word or phrase into a query form and clicking the
button to execute the query (for example, the Execute Query button on the sample query form). Searches produce a list of files
that contain the word or phrase no matter where they appear in the text.
This list gives the rules for formulating queries:
- Multiple consecutive words are treated as a phrase; they must appear in the same order within a matching document.
- Queries are case-insensitive, so you can type your query in uppercase or lowercase.
- You can search for any word except for those in the exception list (for English, this includes a, an, and, as, and other
common words), which are ignored during a search.
- Words in the exception list are treated as placeholders in phrase and proximity queries. For example, if you searched for "Word for Windows", the results could give you "Word for Windows" and "Word and Windows," because for is a
noise word and appears in the exception list.
- Punctuation marks such as the period (.), colon (:), semicolon (;), and comma (,) are ignored during a search.
- To use specially treated characters such as &, |, ^, #, @, $, (, ), in a query, enclose your query in quotation marks (").
- To search for a word or phrase containing quotation marks, enclose the entire phrase in quotation marks and then double the quotation marks around the word or words you want to surround with quotes. For example, "World-Wide Web or ""Web""" searches for World-Wide Web or "Web".
- You can use Boolean operators (AND, OR, and NOT) and the proximity operator (NEAR) to specify additional
search information.
- The wildcard character (*) can match words with a given prefix. The query esc* matches the terms "ESC," "escape," and so on.
- Free-text queries can be specified without regard to query syntax.
- Vector space queries can be specified.
- Activex (OLE) and file attribute property value queries can be issued.
Boolean and Proximity Operators
Boolean and proximity operators can create a more precise query.
| To Search For | Example | Results |
Both terms in the same page | access and basic
--Or--
access & basic | Pages with both the words "access" and
"basic" |
Either term in a page | cgi or isapi
--Or--
cgi | isapi | Pages with the words "cgi" or "isapi" |
The first term without the second term | access and not basic
--Or--
access & ! basic | Pages with the word "access" but not "basic" |
Pages not matching a property value | not @size = 100
--Or--
! @size = 100 | Pages that are not 100 bytes |
Both terms in the same page, close together | excel near project
--Or--
excel ~ project | Pages with the word "excel" near the word
"project" |
Wildcard operators help you find pages containing words similar to a given word.
The query engine finds pages that best match the words and phrases in a free-text query. This is done by automatically finding
pages that match the meaning, not the exact wording, of the query. Boolean, proximity, and wildcard operators are ignored
within a free-text query. Free-text queries are prefixed with $contents.
The query engine supports vector space queries. Vector queries return pages that match a list of words and phrases. The rank
of each page indicates how well the page matched the query.
Property value queries can be used to find files that have property values that match a given criteria. The properties over which
you can query include basic file information like file name and file size, and ActiveX properties including the document
summary (abstract) that is stored in files created by ActiveX-aware applications.
Property names are preceded by either the "at" (@) or number sign (#) character. Use @ for relational queries, and # for
regular expression queries.
ActiveX property values can also be used in queries. Web sites with files created by most ActiveX-aware applications can be
queried for these properties:
Relational operators are used in relational property queries.
| Example | Results |
@size > 1000000 | Pages larger than one million bytes |
@write > 95/12/23 | Pages modified after the date |
Apple tree | Pages with the phrase "apple tree" |
"apple tree" | Same as above |
@contents apple tree | Same as above |
Microsoft and @size > 1000000 | Pages with the word "Microsoft" that are larger than one million bytes |
"microsoft and @size > 1000000" | Pages with the phrase specified (not the same as above) |
#filename *.avi | Video files (the # prefix is used because the query contains a regular expression) |
@attrib ^s 32 | Pages with the archive attribute bit on |
@docauthor = John Smith | Pages with the given author |
$contents why is the sky blue? | Pages that match the query |
@size < 100 & #filename *.gif
| Graphics Interchange Format (GIF) files less than 100 bytes in size |
These properties are always available for queries. Additional properties may also be available depending on the configuration
of the Web server.
| Friendly Name | Datatype | Property |
Access | DBTYPE_DATE | Last time file was accessed. |
All | (not applicable) | Searches every property for a string. Can be queried but not retrieved. |
AllocSize | DBTYPE_I8 | Size of disk allocation for file. |
Attrib | DBTYPE_UI4 | File attributes. Documented in Win32 SDK. |
ClassId | DBTYPE_GUID | Class ID of object, for example, WordPerfect, Word, and so on. |
Change | DBTYPE_DATE | Last time file was changed (includes changes to attributes). |
Characterization | DBTYPE_WSTR |
DBTYPE_BYREF | Characterization, or abstract, of document. Computed by Index Server. |
Contents | (not applicable) | Main contents of file. Can be queried but not retrieved. |
Create | DBTYPE_DATE | ime file was created. |
DocAppName | DBTYPE_STR |
DBTYPE_BYREF | Name of application that created the file. |
DocAuthor | DBTYPE_STR |
DBTYPE_BYREF | Author of document. |
DocCategory | DBTYPE_STR | Type of document such as a memo, schedule, or whitepaper. |
DocCharCount | DBTYPE_I4 | Number of characters in document. |
DocComments | DBTYPE_STR |
DBTYPE_BYREF | Comments about document. |
DocCompany | DBTYPE_STR | Name of the company for which the document was written. |
DocCreatedTm | DBTYPE_DATE | Time document was created. |
DocEditTime | DBTYPE_DATE | Total time spent editing document. |
DocKeywords | DBTYPE_STR |
DBTYPE_BYREF | Document keywords. |
DocLastAuthor | DBTYPE_STR |
DBTYPE_BYREF | Most recent user who edited document. |
DocLastPrinted | DBTYPE_DATE | Time document was last printed. |
DocLastSavedTm | DBTYPE_DATE | Time document was last saved. |
DocManager | DBTYPE_STR | Name of the manager of the document';s author. |
DocPageCount | DBTYPE_I4 | Number of pages in document. |
DocRevNumber | DBTYPE_STR |
DBTYPE_BYREF | Current version number of document. |
DocSubject | DBTYPE_STR |
DBTYPE_BYREF | Subject of document. |
DocTemplate | DBTYPE_STR |
DBTYPE_BYREF | Name of template for document. |
DocTitle | DBTYPE_STR |
DBTYPE_BYREF | Title of document. |
DocWordCount | DBTYPE_I4 | Number of words in document. |
FileIndex | DBTYPE_I8 | Unique ID of file. |
FileName | DBTYPE_WSTR |
DBTYPE_BYREF | Name of file. |
HitCount | DBTYPE_I4 | Number of hits (words matching query) in file. |
HtmlHRef | DBTYPE_WSTR |
DBTYPE_BYREF | Text of HTML HREF. Can be queried but not retrieved. |
HtmlHeading1 | DBTYPE_WSTR |
DBTYPE_BYREF | Text of HTML document in style H1. Can be queried but not retrieved. |
HtmlHeading2 | DBTYPE_WSTR |
DBTYPE_BYREF | Text of HTML document in style H2. Can be queried but not retrieved. |
HtmlHeading3 | DBTYPE_WSTR |
DBTYPE_BYREF | Text of HTML document in style H3. Can be queried but not retrieved. |
HtmlHeading4 | DBTYPE_WSTR |
DBTYPE_BYREF | Text of HTML document in style H4. Can be queried but not retrieved. |
HtmlHeading5 | DBTYPE_WSTR |
DBTYPE_BYREF | Text of HTML document in style H5. Can be queried but not retrieved. |
HtmlHeading6 | DBTYPE_WSTR |
DBTYPE_BYREF | Text of HTML document in style H6. Can be queried but not retrieved. |
Path | DBTYPE_WSTR |
DBTYPE_BYREF | Full physical path to file, including file name. |
Rank | DBTYPE_I4 | Rank of row. Ranges from 0 to 1000. Larger numbers indicate better matches. |
RankVector | DBTYPE_I4 |
DBTYPE_VECTOR | Ranks of individual components of a vector query . |
SecurityChange | DBTYPE_DATE | Last time security was changed on file. |
ShortFileName | DBTYPE_WSTR |
DBTYPE_BYREF | Short (8.3) file name. |
Size | DBTYPE_I8 | Size of file, in bytes. |
USN | DBTYPE_I8 | Update Sequence Number. NTFS drives only. |
VPath | DBTYPE_WSTR |
DBTYPE_BYREF | Full virtual path to file, including file name. If more than one possible path, then the best
match for the specific query is chosen. |
WorkId | DBTYPE_I4 | Internal ID for file. Used within Index Server. |
Write | DBTYPE_DATE | Last time file was written. |