| Oracle9i XML Database Developer's Guide - Oracle XML DB Release 2 (9.2) Part Number A96620-02 |
|
|
View PDF |
This chapter explains the use of Oracle Text functionality in indexing and querying XML data. It contains the following sections:
This chapter describes the following aspects of Oracle Text:
XMLType dataOracle Text (aka interMedia Text) can be used to search XML documents. It extends Oracle9i by indexing any text or document stored in Oracle. It can also search documents in the file system and URLs.
Oracle Text enables the following:
You can query XML data stored in the database directly, without using Oracle Text. However, Oracle Text is useful for boosting query performance.
Oracle Text is a standard feature that comes with every Oracle9i Standard, Enterprise, and Personal edition license. It needs to be selected during installation. No special installation instructions are required.
Oracle Text is essentially a set of schema objects owned by CTXSYS. These objects are linked to the Oracle kernel. The schema objects are present when you perform an Oracle9i installation.
You can now perform Oracle Text searches on tables containing XMLType columns.
You can find more examples for Oracle Text and for creating section group indexes at the following site: http://otn.oracle.com/products/text
XML text is aVARCHAR2 or CLOB type in an Oracle9i database table with character semantics. Oracle Text can also deal with documents in a file system or in URLs, but we are not considering these document types in this chapter.
To simplify the examples included in this chapter they use a subset of the Oracle Text options and make the following assumptions:
CREATE INDEX or ALTER INDEX statement. Other parameter types available for CREATE INDEX and ALTER INDEX, are DATASTORE, FILTER, LEXER, STOPLIST, and WORDLIST.
Here is an example of using SECTION GROUP in CREATE INDEX:
CREATE INDEX my_index ON my_table ( my_column ) INDEXTYPE IS ctxsys.context PARAMETERS ( 'SECTION GROUP my_section_group' ) ;
AUTO_SECTION_GROUP and XML_SECTION_GROUP, and PATH_SECTION_GROUP.See Also:
|
With Oracle Text you can use the following users/roles:
CTXSYS to administer usersCTXAPP to create and delete Oracle Text preferences and use Oracle Text PL/SQL packagesUserCTXSYS is created at install time. Administer Oracle Text users as this user. User CTXSYS has the following privileges:
CTX_ADM PL/SQL package to start servers and set system-parametersCTXAPP roleAny user can create an Oracle Text index and issue a Text query. For additional tasks, use the CTXAPP role. This is a system-defined role that enables you to perform the following tasks:
CTX_DDL packageOracle Text's main purpose is to provide an implementation for the CONTAINS operator. The CONTAINS operator can be used in the WHERE clause of a SELECT statement to specify the query expression for a Text query.
Here is the CONTAINS syntax:
...WHERE CONTAINS([schema.]column,text_query VARCHAR2,[label NUMBER])
where:
For each row selected, CONTAINS returns a number between 0 and 100 that indicates how relevant the document row is to the query. The number 0 means that Oracle found no matches in the row. You can obtain this score with the SCORE operator.
The following example illustrates how the CONTAINS operator is used in a SELECT statement:
SELECT id FROM my_table WHERE CONTAINS (my_column, 'receipts') > 0
The'receipts' parameter of the CONTAINS operator is called the "Text Query Expression".
The following example searches for all documents in the text column that contain the word Oracle. The score for each row is selected with the SCORE operator using a label of 1:
SELECT SCORE(1), title from newsindex WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;
The CONTAINS operator must always be followed by the > 0 syntax. This specifies that the score value calculated by the CONTAINS operator must be greater than zero for the row selected.
When the SCORE operator is called, such as in a SELECT clause, the operator must reference the label value as shown in the example.
When documents have internal structure such as in HTML and XML, you can define document sections using embedded tags before you index. This enables you to query within the sections using the WITHIN operator.
You can query within attribute sections when you index with either XML_SECTION_GROUP, AUTO_SECTION_GROUP, or PATH_SECTION_GROUP your section group type. Consider the following XML document:
<book title="Tale of Two Cities">It was the best of times.</book>
If you use XML_SECTION_GROUP, you can specify any of the following sections:
This chapter only focuses on Zone, Field, and Attribute sections. For more information on Special sections see Oracle Text Reference and Oracle Text Application Developer's Guide.
The syntax for this is:
CTX_DDL.ADD_ZONE_SECTION( group_name in varchar2, section_name in varchar2, tag in varchar2);
To define a chapter as a Zone section, create an XML_SECTION_GROUP and define the Zone section as follows:
EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); EXEC ctx_ddl.add_zone_section('myxmlgroup', 'chapter', 'chapter');
When you define Zone section as such and index the document set, you can query the XML chapter Zone section as follows:
'Cities within chapter'
The syntax for this is:
CTX_DDL.ADD_FIELD_SECTION( group_name in varchar2, section_name in varchar2, tag in varchar2);
To define a abstract as a Field section, create an XML_SECTION_GROUP and define the Field section as follows:
EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); EXEC ctx_ddl.add_field_section('myxmlgroup', 'abstract', 'abstract');
When you define Field section as such and index the document set, you can query the XML abstract Field section as follows:
'Cities within abstract'
The syntax for this is:
CTX_DDL.ADD_ATTR_SECTION( group_name in varchar2, section_name in varchar2, tag in varchar2);
To define the booktitle attribute as an Attribute section, create an XML_SECTION_GROUP and define the Attribute section as follows:
EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); EXEC ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');
When you define the Attribute section as such and index the document set, you can query the XML booktitle attribute text as follows:
'Cities within booktitle'
The following constraints apply to querying within Attribute or Field sections:
WITHIN clause. Using the following XML document:
<book title="Tale of Two Cities">It was the best of times.</book>
querying on Tale will not work unless qualified with 'WITHIN title@book'.
WITHIN query.....Now is the time for all good <word type="noun"> men </word> to come to the aid......
The search would result in a regular query's, "good men", and ignore the intervening attribute text.
When you use the AUTO_SECTION_GROUP or PATH_SECTION_GROUP to index XML documents, Oracle9i automatically creates sections.
To search on Tale within the Attribute section booktitle, include the following WITHIN clause in your SELECT statement:
XML_SECTION_GROUP:
... WHERE CONTAINS ('Tale INPATH booktitle')>0;
PATH_SECTION_GROUP
... WHERE CONTAINS ('Tale INPATH title@book')>0;
The syntax for ALTER INDEX is:
ALTER INDEX [schema.]index REBUILD [ONLINE] [PARAMETERS (paramstring)];
where
paramstring = 'replace [datastore datastore_pref] [filter filter_pref] [lexer lexer_pref] [wordlist wordlist_pref] [storage storage_pref] [stoplist stoplist] [section group section_group] [memory memsize] | ... | add zone section section_name tag tag | add field section section_name tag tag [(VISIBLE | INVISIBLE)] | add attr section section_name tag tag@attr | add stop section tag'
The added section applies only to documents indexed after this operation. Thus for the change to take effect, you must manually re-index any existing documents that contain the tag. The index is not rebuilt by this statement.
Here is the WITHIN syntax for querying sections:
...WHERE CONTAINS(text,'XML WITHIN title') >0;...
This searches for expression text within a section. If you are using XML_SECTION_GROUP the following restrictions apply to the pre-defined zone, field, or attribute section:
WITHIN operators (nested WITHIN) whose section is a zone or special section.WITHIN operator.You can combine and nest WITHIN clauses. For finer grained searches of XML sections, you can use WITHIN clauses inside CONTAINS select statements.
The WITHIN operator has the following limitations:
WITHIN clause in a phrase. For example, you cannot write: term1 WITHIN section term2WITHIN with expansion operators, such as $ ! and *.WITHIN is a reserved word, you must escape the word with braces to search on it.
In Oracle9i Oracle Text introduced a new section type and new query operators which support an XPath-like query language. Indexes of type context with XML path searching are able to perform very complex section searches on XML documents. Here are the basic concepts of path indexing and path querying.
Section searching is enabled by defining section groups. To use XML path searching, the Oracle Text index must be created with the new section group, PATH_SECTION_GROUP as follows:
begin ctx_ddl.create_section_group('mypathgroup','PATH_SECTION_GROUP'); end;
To create the Oracle Text index use this command:
create index order_idx on library_catalog(text) indextype is ctxsys.context parameters ('SECTION GROUP mypathgroup');
The Oracle Text path query language is based on W3C XPath. For Oracle9i Release 1 (9.0.1) and higher, you can use the INPATH and HASPATH operators to express path queries.
You can use INPATH operator to perform path searching in XML documents. Table 7-2 summarizes the ways you can use the INPATH operator for path searching.
| Path Search Feature | Syntax | Description |
|---|---|---|
|
Simple Tag Searching |
virginia INPATH (//STATE) |
Finds all documents where the word "virginia" appears between <STATE> and </STATE>. The STATE element can appear at any level of the document structure. |
|
Case-sensitivity |
virginia INPATH (State) |
Tags and attribute names in path searching are case-sensitive. virginia |
|
Top-Level Tag Searching |
For example, the following query finds Quijote where it occurs between <order> and </order>: select id from library_catalog where contains(text,'Quijote INPATH(order)') > 0; Here <order> must be the top level tag. |
Finds all documents where "virginia" appears in a Legal element which is the top-level tag.'Legal' MUST be the top-level tag of the document.'virginia' may appear anywhere in this tag regardless of other intervening tags. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Filing ID="f001" FilingType="Civil"> <AddressState>VIRGINIA</AddressState> </Address> ... </Legal> |
|
Any Level Tag Searching |
For example, a double slash indicates "any number of levels" down. The following query finds Quijote inside a <title> tag that occurs at the top level or any lower level: select id from library_catalog where contains(text,'Quijote INPATH(//title)') > 0; |
'Virginia' can appear anywhere within an 'Address' tag, which may appear within any other tags. for example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Filing ID="f001" FilingType="Civil"> <AddressState> VIRGINIA </AddressState>... </Legal> |
|
Direct Parentage Path Searching |
virginia INPATH (//CourtInformation/Location) select id from library_catalog where contains(text,'virginia INPATH(order/item)') > 0; |
Finds all documents where "virginia" appears in a <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Filing ID="f001" FilingType="Civil"> <AddressState> VIRGINIA </AddressState> </Address>... </CourtInformation> |
|
Single-Level Wildcard Searching |
'virginia INPATH (//CaseCaption/*/Location)' |
Finds all documents where "virginia" appears in a B element which is a grandchild of an A element. For instance, <A><D><B>virginia</B></D></A>. The intermediate element does not need to be an indexed XML tag. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Filing ID="f001" FilingType="Civil"> <AddressState>VIRGINIA</AddressState>... </Legal> |
|
Multi-level Wildcard Searching |
'virginia INPATH (Legal/*/Filing/*/*/CourtInformation)' |
'Legal' must be a top-level tag, and there must be exactly one tag-level between 'Legal' and 'Filing', and two between 'Filing' and 'CourtInformation'. 'Virginia' may then appear anywhere within 'CourtInformation'. For example: <?xml version="1.0" standalone="yes"?> <!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> --> <Filing ID="f001" FilingType="Civil"> <AddressState>VIRGINIA</AddressState> IN THE CIRCUIT COURT OF LOUDOUN COUNTY </CourtInformation>.... |
|
Descendant Searching |
virginia INPATH(A//B) |
Finds all documents where "virginia" appears in a B element which is some descendant (any level) of an A element. |
|
Attribute Searching |
virginia INPATH(A/@B) |
Finds all documents where "virginia" appears in the B attribute of an A element. You can search within an attribute value using the syntax <tag>/@<attribute>: select id from library_catalog where contains(text,'dvd You can use boolean AND and OR to combine existence or equality predicates in a test. select id from library_catalog where contains(text,'Levy or Cervantes |
|
Descendant/Attribute Existence Testing |
You can search for documents using the any-level tag searching: select id from library_catalog where contains (text,'Quijote INPATH(/order/title)') > 0; You can also use the "*" as a single level wildcard. The * matches exactly one level.: select id from library_catalog where contains (text,'Cervantes INPATH(/order/*/author)') > 0; |
Finds all documents where "virginia" appears in an A element which has a B element as a direct child. |
|
virginia INPATH (A[@B = "pot of gold"]), would, with the default lexer and stoplist, match any of the following: <A B="POT OF GOLD">virginia</A> By default, lexing is case-independent, so "pot" matches "POT", <A B="POT BLACK GOLD">virginia</A> By default, "of" is a stopword, and, in a query, would match any word in that position, <A B=" Pot OF Gold ">virginia</A> |
Finds all documents where "virginia" appears in an A element which has a B attribute whose value is "foo".
Within equality (See "Using INPATH Operator for Path Searching in XML Documents" ) is used to evaluate the test. Whitespace is mainly ignored in text indexing. Again, lexing is case-independent: <A B="pot_of_gold">virginia</A> Underscore is a non-alphabetic character, and is not a join character by default. As a result, it is treated more or less as whitespace and breaks up that string into three words. select id from library_catalog where contains(text,'(Bob the Builder) The following will not return rows: select id from library_catalog where contains(text,'(Bob the Builder) |
|
|
Numeric literals are allowed. But they are treated as text. The within equality is used to evaluate. This means that the query does NOT match. That is, <A B="5.0">virginia</A> does not match A[@B=5] where "5.0", a decimal is not considered the same as 5, an integer. |
||
|
virginia INPATH (A[B AND @C = "foo"])... |
Predicates can be conjunctively combined. |
|
|
virginia INPATH (A[@B = "foo"]/C/D) virginia INPATH(A//B[@C]/D[E])... |
Use the HASPATH operator to find all XML documents that contain a specified section path. HASPATH is used when you want to test for path existence. It is also very useful for section equality testing. To find all XML documents where an order has an item within it:
select id from library_catalog where contains(text,'HASPATH(order/item)') > 0;
will return all documents where the top-level tag is a order element which has a item element as a direct child.
In Oracle9i, Oracle Text introduces a new section type and new query operators which support an XPath-like query language. Indexes of type context with XML path searching are able to perform very complex section searches on XML documents. Here are more examples of path querying using INPATH and HASPATH. Assuming the following XML document:
<?xml version="1.0"?> <order> <item type="book"> <title>Crypto</title> <author>Levi</author> </item> <item type="dvd"> <title> Bob the Builder</title> <author>Auerbach</author> </item> <item type="book"> <title>Don Quijote</title> <author>Cervantes</author> </item> </order>
In general, use INPATH and HASPATH operators only when your index has been created with PATH_SECTION_GROUP. Use of PATH_SECTION_GROUP enables path searching. Path searching extends the syntax of the WITHIN operator so that the section name operand (right-hand-side) is a path instead of a section name.
Only use the HASPATH operator when your index has been created with the PATH_SECTION_GROUP. The syntax for the HASPATH operator is:
HASPATH searches an XML document set and returns a score of 100 for all documents where path exists. Parent and child paths are separated with the / character, for example, A/B/C. For example, the query:
...WHERE CONTAINS (col,'HASPATH(A/B/C)')>0;
finds and returns a score of 100 for the document:
<A><B><C>Virginia</C></B></A>
without having to reference Virginia at all.
HASPATH clause searches an XML document set and returns a score of 100 for all documents that have element A with content value and only that value. HASPATH is used to test equality. This is the "Section Equality Testing" feature of the HASPATH operator. The query:
...WHERE CONTAINS virginia INPATH A
finds <A>virginia</A>, but it also finds <A>virginia state</A>. To limit the query to the term virginia and nothing else, you can use a section equality test with the HASPATH operator. For example:
... WHERE CONTAINS (col,'HASPATH(A="virginia")'
finds and returns a score of 100 only for the first document, and not the second.
You can do tag value equality test with HASPATH:
select id from library_catalog
where CONTAINS(text,'HASPATH (//author="Auerbach")') >0;
To build a Oracle Text query application carry out the following steps:
After creating and preparing your data, you are ready to perform the next step. See "Step 1. Create a Section Group Preference".
CONTAINS operator. Now you can finish building your query application. See "Building a Query Application with Oracle Text".First determine the role you need. See Oracle Text Reference and "Oracle Text Users and Roles" , and grant the appropriate privilege as follows:
CONNECT system/manager GRANT ctxapp to scott; CONNECT scott/tiger
The first thing you must do is create a preference. This section describes how to create section preferences using PATH_SECTION_GROUP, XML_SECTION_GROUP, and AUTO_SECTION_GROUP. Table 7-3 describes the groups and summarizes their features.
| Section Group | Description |
|---|---|
|
XML_SECTION_GROUP |
Use this group type for indexing XML documents and for defining sections in XML documents. |
|
AUTO_SECTION_GROUP |
Use this group type to automatically create a zone section for each start-tag/end-tag pair in an XML document. The section names derived from XML tags are case-sensitive as in XML. Attribute sections are created automatically for XML tags that have attributes. Attribute sections are named in the form attribute@tag. Stop sections, empty tags, processing instructions, and comments are not indexed. The following limitations apply to automatic section groups:
|
|
PATH_SECTION_GROUP |
Use this group type to index XML documents. Behaves like the How is PATH_SECTION_GROUP Similar to AUTO_SECTION_GROUP? Documents are assumed to be XML, Every tag and every attribute is indexed by default, Stop sections can be added to prevent certain tags from being indexed, Only stop sections can be added How Does PATH_SECTION_GROUP Differ From AUTO_SECTION_GROUP? Path Searching is allowed at query time (see "Case Study: Searching XML-Based Conference Proceedings" and "You can use INPATH operator to perform path searching in XML documents. Table 7-2 summarizes the ways you can use the INPATH operator for path searching." ) with the new |
|
Note: If you are using the |
How do you determine which section groups is best for your application? This depends on your application. Table 7-4 lists some general guidelines to help you decide which of the XML_, AUTO_, or PATH_ section groups to use when indexing your XML documents, and why.
The following command creates a section group called, xmlgroup, with the XML_SECTION_GROUP group type:
EXEC ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
You can set up your indexing operation to automatically create sections from XML documents using the section group AUTO_SECTION_GROUP. Here, Oracle creates zone sections for XML tags. Attribute sections are created for those tags that have attributes, and these attribute sections are named in the form "tag@attribute."
The following command creates a section group called autogroup with the AUTO_SECTION_GROUP group type. This section group automatically creates sections from tags in XML documents.
EXEC ctx_ddl.create_section_group('autogroup', 'AUTO_SECTION_GROUP');
To enable path section searching, index your XML document with PATH_SECTION_GROUP. For example:
EXEC ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP');
To set the preference's attributes for XML_SECTION_GROUP, use the following procedures:
To set the preference's attributes for AUTO_SECTION_GROUP and PATH_SECTION_GROUP, use the following procedures:
There are corresponding CTX_DDL.DROP sections and CTX_DDL.REMOVE section commands.
The syntax for CTX_DDL.add_zone_section follows:
CTX_DDL.Add_Zone_Section ( group_name => 'my_section_group' /* whatever you called it in the preceding section */ section_name => 'author' /* what you want to call this section */ tag => 'my_tag' /* what represents it in XML */ );
where 'my_tag' implies opening with <my_tag> and closing with </my_tag>.
add_zone_section guidelines are listed here:
The syntax for CTX_DDL.ADD_ATTR_SECTION follows:
CTX_DDL.Add_Attr_Section ( /* call this as many times as you need to describe the attribute sections */ group_name => 'my_section_group' /* whatever you called it in the preceding section */ section_name => 'author' /* what you want to call this section */ tag => 'my_tag' /* what represents it in XML */ );
where 'my_tag' implies opening with <my_tag> and closing with </my_tag>.
Add_Attr_Section guidelines are listed here:
ADD_ATTR_SECTION adds an attribute section to an XML section group. This procedure is useful for defining attributes in XML documents as sections. This enables searching XML attribute text with the WITHIN operator.
The section_name:
WITHIN queries on the attribute text.The tag specifies the name of the attribute in tag@attr format. This is case-sensitive.
The syntax for CTX_DDL.Add_Field_Section follows:
CTX_DDL.Add_Field_Section ( group_name => 'my_section_group' /* whatever you called it in the preceding section */ section_name => 'qq' /* what you want to call this section */ tag => 'my_tag' /* what represents it in XML */ ); visible => TRUE or FALSE );
Add_Field_Section guidelines are listed here:
VISIBLE is set to TRUE then the text within the Field section will be indexed as part of the enclosing document. For example:
<state> Virginia </state> CTX_DDL.Add_Field_Section (
group_name => 'my_section_group' section_name => 'state' tag => 'state' visible => TRUE or FALSE );
If visible is set to TRUE, then searching on Virginia without specifying the state Field section produces a hit.
If visible is set to FALSE, then searching on Virginia without specifying the state Field section does not produce a hit.
Attribute section differs from Field section in the following ways:
WHERE CONTAINS (..., '... jeeves',...)...
does NOT find the document. This is similar to when Field sections have visible set to FALSE. Unlike Field sections, however, Attribute section within searches can distinguish between occurrences. Consider the document:
<comment author="jeeves"> I really like Oracle Text </comment> <comment author="bertram"> Me too </comment>
the query:
WHERE CONTAINS (...,'(cryil and bertram) WITHIN author', ...)...
will NOT find the document, because "jeeves" and "bertram" do not occur within the SAME attribute text.
tag@attr to a single section name. Attribute sections do not support default values. Given the document:
<!DOCTYPE foo [ <!ELEMENT foo (bar)> <!ELEMENT bar (#PCDATA)>
<!ATTLIST bar rev CDATA "8i"> ]> <foo> <bar>whatever</bar> </foo>
and attribute section:
ctx_ddl.add_attr_section('mysg','barrev','bar@rev');
the query:
8i within barrev does not hit the document, although in XML semantics, the "bar" element has a default value for its "rev" attribute.
CtX_DDL.Add_Stop_Section ( group_name => 'my_section_group' /* whatever you called it in the preceding section */ section_name => 'qq' /* what you want to call this section */ );
Create an index depending on which section group you used to create a preference:
To index your XML document when you have used XML_SECTION_GROUP, you can use the following statement:
CREATE INDEX myindex ON docs(htmlfile) INDEXTYPE IS ctxsys.context parameters('section group xmlgroup');
The following statement creates the index, myindex, on a column containing XML files using the AUTO_SECTION_GROUP:
CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS ('section group autogroup');
To index your XML document when you have used PATH_SECTION_GROUP, you can use the following statement:
CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS ('section group xmlpathgroup');
| See Also:
Oracle Text Reference for detailed notes on |
EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); /* ADDING A FIELD SECTION */ EXEC ctx_ddl.Add_Field_Section /* THIS IS KEY */ ( group_name =>'my_section_group', section_name =>'author',/* do this for EVERY tag used after "WITHIN" */ tag =>'author' ); EXEC ctx_ddl.Add_Field_Section /* THIS IS KEY */ ( group_name =>'my_section_group', section_name =>'document',/*do this for EVERY tag after "WITHIN" */ tag =>'document' ); ... / /* ADDING AN ATTRIBUTE SECTION */ EXEC ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title'); /* The more sections you add to your index, the longer your search will take.*/ /* Useful for defining attributes in XML documents as sections. This allows*/ /* you to search XML attribute text using the WITHIN operator.*/ /* The section name: /* ** Is used for WITHIN queries on the attribute text. ** Cannot contain the colon (:) or dot (.) characters. ** Must be unique within group_name. ** Is case-insensitive. ** Can be no more than 64 bytes. ** The tag specifies the name of the attribute in tag@attr format. This is case-sensitive. */ /* Names used as arguments of the keyword WITHIN can be different from the actual XML tag names. Many tags can be mapped to the same name at query time.*/ /* Call CTX_DDL.Add_Zone_Section for each tag in your XML document that you need to search on. */ EXEC ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec', 'mydocname(book)'); CREATE INDEX my_index ON my_table ( my_column ) INDEXTYPE IS ctxsys.context PARAMETERS ( 'SECTION GROUP my_section_group' ); SELECT my_column FROM my_table WHERE CONTAINS(my_column, 'smith WITHIN author') > 0;
See the section, "Querying with the CONTAINS Operator" for information about how to use the CONTAINS operator in query statements.
You can query within attribute sections when you index with either XML_SECTION_GROUP or AUTO_SECTION_GROUP as your section group type.
Assume you have an XML document as follows:
<book title="Tale of Two Cities">It was the best of times.</book>
You can define the section title@book as the attribute section title. You can do so with the CTX_DLL.Add_Attr_Section procedure or dynamically after indexing with ALTER INDEX.
|
Note: When you use the |
If you use the XML_SECTION_GROUP, you can name attribute sections anything with CTX_DDL.ADD_ATTR_SECTION.
To search on Tale within the attribute section title, issue the following query:
WHERE CONTAINS (...,'Tale WITHIN title', ...)
When you define the TITLE attribute section as such and index the document set, you can query the XML attribute text as follows:
... WHERE CONTAINS (...,'Cities WITHIN booktitle', ....)...
When you define the AUTHOR attribute section as such and index the document set, you can query the XML attribute text as follows:
... WHERE 'England WITHIN authors'
This example does the following:
drop table res_xml; CREATE TABLE res_xml ( pk NUMBER PRIMARY KEY , text CLOB ) ; insert into res_xml values(111, 'ENTITY chap8 "Chapter 8, <q>Keeping it Tidy: the XML Rule Book </q>"> this is the document section'); commit; --- --- script to create index on res_xml --- --- cleanup, in case we have run this before DROP INDEX res_index ; EXEC CTX_DDL.DROP_SECTION_GROUP ( 'res_sections' ) ; --- create a section group BEGIN CTX_DDL.CREATE_SECTION_GROUP ( 'res_sections', 'XML_SECTION_GROUP' ) ; CTX_DDL.ADD_FIELD_SECTION ( 'res_sections', 'chap8', '<q>') ; END ; / begin ctx_ddl.create_preference ( preference_name => 'my_basic_lexer', object_name => 'basic_lexer' ); ctx_ddl.set_attribute ( preference_name => 'my_basic_lexer', attribute_name => 'index_text', attribute_value => 'true' ); ctx_ddl.set_attribute ( preference_name => 'my_basic_lexer', attribute_name => 'index_themes', attribute_value => 'false'); end; / CREATE INDEX res_index ON res_xml(text) INDEXTYPE IS ctxsys.context PARAMETERS ( 'lexer my_basic_lexer SECTION GROUP res_sections' ) ;
Test the preceding index with a test query, such as:
SELECT pk FROM res_xml WHERE CONTAINS( text, 'keeping WITHIN chap8' )>0 ;
drop table explain_ex; create table explain_ex ( id number primary key, text varchar(2000) ); insert into explain_ex ( id, text ) values ( 1, 'thinks thinking thought go going goes gone went' || chr(10) || 'oracle orackle oricle dog cat bird' || chr(10) || 'President Clinton' ); insert into explain_ex ( id, text ) values ( 2, 'Last summer I went to New England' || chr(10) || 'I hiked a lot.' || chr(10) || 'I camped a bit.' ); commit;
Set Define Off select text from explain_ex WHERE CONTAINS ( text, '( $( think & go ) , ?oracle ) & ( dog , ( cat & bird ) ) & about(mammal during Bill Clinton)' ) > 0; select text from explain_ex WHERE CONTAINS ( text, 'about ( camping and hiking in new england )' ) > 0;
ctx_ddl_create_section_group('auto', 'AUTO_SECTION_GROUP');CREATE INDEX myindex ON docs(xmlfile_column) INDEXTYPE IS ctxsys.context PARAMETERS ('filter ctxsys.null_filter SECTION GROUP auto'); SELECT xmlfile_column FROM docs WHERE CONTAINS (xmlfile_column, 'virginia WITHIN title')>0;
EXEC ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP');CREATE INDEX myindex ON xmldocs(xmlfile_column) INDEXTYPE IS ctxsys.context PARAMETERS ('section group xmlpathgroup');
SELECT xmlfile_column FROM xmldocs ... WHERE CONTAINS (column, 'Tale WITHIN title@book')>0;
Consider an XML file that defines the BOOK tag with a TITLE attribute as follows:
<BOOK TITLE="Tale of Two Cities"> It was the best of times. </BOOK> <Author="Charles Dickens"> Born in England in the town, Stratford_Upon_Avon </Author>
Recall the CTX_DDL.ADD_ATTR_SECTION syntax is:
CTX_DDL.Add_Attr_Section ( group_name, section_name, tag );
To define the title attribute as an attribute section, create an XML_SECTION_GROUP and define the attribute section as follows:
ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP'); ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title'); ctx_ddl.add_attr_section('myxmlgroup', 'authors', 'author'); end;
An Oracle Text query application enables viewing documents returned by a query. You typically select a document from the hit list and then your application presents the document in some form.
With Oracle Text, you can render a document in different ways. For example, with the query terms highlighted. Highlighted query terms can be either the words of a word query or the themes of an ABOUT query in English. This rendering uses the CTX_DOC.HIGHLIGHT or CTX_DOC.MARKUP procedures.
You can also obtain theme information from documents with the CTX_DOC.THEMES PL/SQL package. Besides these there are several other CTX_DOC procedures for presenting your query results.
INPATH does not support working with highlighting or themes.
| See Also:
Oracle Text Reference for more information on the |
The Oracle9i datatype for storing XML, XMLType, is a core database feature.
You can create an Oracle Text index on this type, but you need a few database privileges first:
GRANT QUERY REWRITE TO <user>
Without this privilege, the create index will fail with:
ORA-01031: insufficient privileges
<user> should be the user creating the index. The database schema that owns the index, if different, does not need the grant.
init.ora file:
query_rewrite_enabled=true query_rewrite_integrity=trusted
or turn it on for the session as follows:
ALTER SESSION SET query_rewrite_enabled=true; ALTER SESSION SET query_rewrite_integrity=trusted;
Without these, queries will fail with:
DRG-10599: column is not indexed
These privileges are needed because XMLType is really an object, and you access it through a function, hence an Oracle Text index on an XMLType column is actually a function-based index on the getclobval() method of the type. These are the standard grants you need to use function-based indexes, however, unlike function-based B-Tree indexes, you do not need to calculate statistics.
|
Note: Oracle9i SQL Reference under CREATE INDEX, states: To create a function-based index in your own schema on your own table, in addition to the prerequisites for creating a conventional index, you must have the QUERY REWRITE system privilege. To create the index in another schema or on another schema's table, you must have the GLOBAL QUERY REWRITE privilege. In both cases, the table owner must also have the EXECUTE object privilege on the function(s) used in the function-based index. In addition, in order for Oracle to use function-based indexes in queries, the QUERY_REWRITE_ENABLED parameter must be set to TRUE, and the QUERY_REWRITE_INTEGRITY parameter must be set to TRUSTED. |
When an XMLType column is detected, and no section group is specified in the parameters string, the default system examines the new system parameter DEFAULT_XML_SECTION, and uses the section group specified there. At install time this system parameter is set to CTXSYS.PATH_SECTION_GROUP, which is the default path sectioner.
The default filter system parameter for XMLType is DEFAULT_FILTER_TEXT, which means that the INSO filter is not engaged by default.
Other than the database privileges and the special default section group system parameter, indexes on XMLType columns work like any other Oracle Text index.
Here is a simple example:
connect ctxsys/ctxsys GRANT QUERY REWRITE TO xtest; connect xtest/xtest CREATE TABLE xtest(doc sys.xmltype); INSERT INTO xtest VALUES (sys.xmltype.createxml('<A>simple</A>')); CREATE INDEX xtestx ON xtest(doc) INDEXTYPE IS ctxsys.context; ALTER SESSION SET query_rewrite_enabled = true; ALTER SESSION SET query_rewrite_integrity = trusted; SELECT a.doc.getclobval() FROM xtest a WHERE CONTAINS (doc, 'simple INPATH(A)')>0;
UriType columns can be indexed natively in Oracle9i database using Oracle Text. No special datastore is needed.
For example:
CREATE TABLE table uri_tab ( url sys.httpuritype); INSERT INTO uri_tab VALUES (sys.httpuritype.createUri('http://www.oracle.com')); CREATE INDEX urlx ON uri_tab(url) INDEXTYPE IS ctxsys.context; SELECT url FROM uri_tab WHERE CONTAINS(url, 'Oracle')>0;
Table 7-5 lists system parameters used for default preference names for Oracle Text indexing, when the column type is UriType:
Oracle9i Release 1(9.0.1) introduced the Oracle Text PATH_SECTION_GROUP, INPATH(), and HASPATH() query operators. These allow you to do XPath-like text query searches on XML documents using the CONTAINS operator. CONTAINS, however, supports only a subset of XPath functionality. Also, there are important semantic differences between the CONTAINS operator and the existsNode() function.
The existsNode, extract() and extractValue() SQL functions (and the corresponding member functions of XMLType) provide full XPath support. This release of Oracle9i also introduces new extension functions to XPath to support full text searches.
|
Note: This release does not support theme querying for Oracle Text |
Table 7-6 lists and compares CONTAINS() and existsNode() features for searching XMLType data.
| Feature | CONTAINS() | existsNode() |
|---|---|---|
|
XPath Conformance |
-- |
-- |
|
Predicate Support |
-- |
-- |
|
Y |
Y |
|
|
N |
Y |
|
|
N |
Y |
|
|
N |
Y |
|
|
N |
Y |
|
|
N |
Y |
|