Skip Headers

Oracle9i XML Database Developer's Guide - Oracle XML DB
Release 2 (9.2)

Part Number A96620-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Feedback

Go to previous page Go to next page
View PDF

7
Searching XML Data with Oracle Text

This chapter explains the use of Oracle Text functionality in indexing and querying XML data. It contains the following sections:

Searching XML Data with Oracle Text

This chapter describes the following aspects of Oracle Text:

Introducing Oracle Text


Note:

Oracle Text is a strictly server-based implementation.


See Also:

http://otn.oracle.com/products/text

Oracle Text (aka interMedia Text) can be used to search XML documents. It extends Oracle9i by indexing any text or document stored in Oracle. It can also search documents in the file system and URLs.

Oracle Text enables the following:

You can query XML data stored in the database directly, without using Oracle Text. However, Oracle Text is useful for boosting query performance.

See Also :

Accessing Oracle Text

Oracle Text is a standard feature that comes with every Oracle9i Standard, Enterprise, and Personal edition license. It needs to be selected during installation. No special installation instructions are required.

Oracle Text is essentially a set of schema objects owned by CTXSYS. These objects are linked to the Oracle kernel. The schema objects are present when you perform an Oracle9i installation.

Oracle Text Now Supports XMLType

You can now perform Oracle Text searches on tables containing XMLType columns.

Further Oracle Text Examples

You can find more examples for Oracle Text and for creating section group indexes at the following site: http://otn.oracle.com/products/text

Assumptions Made in This Chapter's Examples

XML text is aVARCHAR2 or CLOB type in an Oracle9i database table with character semantics. Oracle Text can also deal with documents in a file system or in URLs, but we are not considering these document types in this chapter.

To simplify the examples included in this chapter they use a subset of the Oracle Text options and make the following assumptions:

Oracle Text Users and Roles

With Oracle Text you can use the following users/roles:

User CTXSYS

UserCTXSYS is created at install time. Administer Oracle Text users as this user. User CTXSYS has the following privileges:

Role CTXAPP

Any user can create an Oracle Text index and issue a Text query. For additional tasks, use the CTXAPP role. This is a system-defined role that enables you to perform the following tasks:

Querying with the CONTAINS Operator

Oracle Text's main purpose is to provide an implementation for the CONTAINS operator. The CONTAINS operator can be used in the WHERE clause of a SELECT statement to specify the query expression for a Text query.

CONTAINS Syntax

Here is the CONTAINS syntax:

...WHERE CONTAINS([schema.]column,text_query VARCHAR2,[label NUMBER])

where:

Table 7-1 CONTAINS Operator: Syntax Description
Syntax Description

[schema.] column

Specifies the text column to be searched on. This column must have a Text index associated with it.

text_query

Specifies the query expression that defines your search in column.

label

Optionally specifies the label that identifies the score generated by the CONTAINS operator.

For each row selected, CONTAINS returns a number between 0 and 100 that indicates how relevant the document row is to the query. The number 0 means that Oracle found no matches in the row. You can obtain this score with the SCORE operator.


Note:

You must use the SCORE operator with a label to obtain this number.


Example 7-1 Using a Simple SELECT Statement with CONTAINS

The following example illustrates how the CONTAINS operator is used in a SELECT statement:

SELECT id FROM my_table
   WHERE
   CONTAINS (my_column, 'receipts') > 0

The'receipts' parameter of the CONTAINS operator is called the "Text Query Expression".


Note:

The SQL statement with the CONTAINS operator requires an Oracle Text index in order to run.


Example 7-2 Using the Score Operator with a Label to Obtain the Relevance

The following example searches for all documents in the text column that contain the word Oracle. The score for each row is selected with the SCORE operator using a label of 1:

SELECT SCORE(1), title from newsindex 
           WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;

The CONTAINS operator must always be followed by the > 0 syntax. This specifies that the score value calculated by the CONTAINS operator must be greater than zero for the row selected.

When the SCORE operator is called, such as in a SELECT clause, the operator must reference the label value as shown in the example.

Using the WITHIN Operator to Narrow Query Down to Document Sections

When documents have internal structure such as in HTML and XML, you can define document sections using embedded tags before you index. This enables you to query within the sections using the WITHIN operator.


Note:

This is only true for XML_SECTION_GROUP, but not true for AUTO_ or PATH_SECTION_GROUP.


Introducing SECTION_GROUPS

You can query within attribute sections when you index with either XML_SECTION_GROUP, AUTO_SECTION_GROUP, or PATH_SECTION_GROUP your section group type. Consider the following XML document:

<book title="Tale of Two Cities">It was the best of times.</book>

XML_SECTION_GROUP

If you use XML_SECTION_GROUP, you can specify any of the following sections:

This chapter only focuses on Zone, Field, and Attribute sections. For more information on Special sections see Oracle Text Reference and Oracle Text Application Developer's Guide.

Zone Sections: CTX_DLL.ADD_ZONE_SECTION Procedure

The syntax for this is:

CTX_DDL.ADD_ZONE_SECTION(
  group_name     in    varchar2,
  section_name   in    varchar2,
  tag            in    varchar2);

To define a chapter as a Zone section, create an XML_SECTION_GROUP and define the Zone section as follows:

EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
EXEC ctx_ddl.add_zone_section('myxmlgroup', 'chapter', 'chapter');

When you define Zone section as such and index the document set, you can query the XML chapter Zone section as follows:

'Cities within chapter' 

Field Sections: CTX_DLL.ADD_FIELD_SECTION Procedure

The syntax for this is:

CTX_DDL.ADD_FIELD_SECTION(
  group_name     in    varchar2,
  section_name   in    varchar2,
  tag            in    varchar2);

To define a abstract as a Field section, create an XML_SECTION_GROUP and define the Field section as follows:

EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
EXEC ctx_ddl.add_field_section('myxmlgroup', 'abstract', 'abstract');

When you define Field section as such and index the document set, you can query the XML abstract Field section as follows:

'Cities within abstract' 

Attribute Section: CTX_DLL.ADD_ATTR_SECTION Procedure

The syntax for this is:

CTX_DDL.ADD_ATTR_SECTION(
  group_name     in    varchar2,
  section_name   in    varchar2,
  tag            in    varchar2);


To define the booktitle attribute as an Attribute section, create an XML_SECTION_GROUP and define the Attribute section as follows:

EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
EXEC ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');


When you define the Attribute section as such and index the document set, you can query the XML booktitle attribute text as follows:

'Cities within booktitle'

Constraints for Querying Attribute or Field Sections

The following constraints apply to querying within Attribute or Field sections:

AUTO_ SECTION_GROUP/ PATH_SECTION_GROUP for INPATH and HASPATH

When you use the AUTO_SECTION_GROUP or PATH_SECTION_GROUP to index XML documents, Oracle9i automatically creates sections.

To search on Tale within the Attribute section booktitle, include the following WITHIN clause in your SELECT statement:

Dynamically Adding Sections or Stop Section Using ALTER INDEX

The syntax for ALTER INDEX is:

ALTER INDEX [schema.]index REBUILD [ONLINE] [PARAMETERS (paramstring)];

where

paramstring = 'replace [datastore datastore_pref] 
                       [filter filter_pref] 
                       [lexer lexer_pref] 
                       [wordlist wordlist_pref] 
                       [storage storage_pref] 
                       [stoplist stoplist] 
                       [section group section_group]
                          [memory memsize]
|    ... 
|    add zone section section_name tag tag
|    add field section section_name tag tag [(VISIBLE | INVISIBLE)]
|    add attr section section_name tag tag@attr
|    add stop section tag'

The added section applies only to documents indexed after this operation. Thus for the change to take effect, you must manually re-index any existing documents that contain the tag. The index is not rebuilt by this statement.

WITHIN Syntax for Section Querying

Here is the WITHIN syntax for querying sections:

...WHERE CONTAINS(text,'XML WITHIN title') >0;... 

This searches for expression text within a section. If you are using XML_SECTION_GROUP the following restrictions apply to the pre-defined zone, field, or attribute section:

You can combine and nest WITHIN clauses. For finer grained searches of XML sections, you can use WITHIN clauses inside CONTAINS select statements.

WITHIN Operator Limitations

The WITHIN operator has the following limitations:

INPATH or HASPATH Operators Search Using XPath-Like Expressions

Path Indexing and Path Querying with Oracle Text

In Oracle9i Oracle Text introduced a new section type and new query operators which support an XPath-like query language. Indexes of type context with XML path searching are able to perform very complex section searches on XML documents. Here are the basic concepts of path indexing and path querying.

Path Indexing

Section searching is enabled by defining section groups. To use XML path searching, the Oracle Text index must be created with the new section group, PATH_SECTION_GROUP as follows:

begin
   ctx_ddl.create_section_group('mypathgroup','PATH_SECTION_GROUP'); 
end; 

To create the Oracle Text index use this command:

create index order_idx on library_catalog(text)
 indextype is ctxsys.context 
 parameters ('SECTION GROUP mypathgroup'); 

Path Querying

The Oracle Text path query language is based on W3C XPath. For Oracle9i Release 1 (9.0.1) and higher, you can use the INPATH and HASPATH operators to express path queries.

Using INPATH Operator for Path Searching in XML Documents

You can use INPATH operator to perform path searching in XML documents. Table 7-2 summarizes the ways you can use the INPATH operator for path searching.

Table 7-2 Path Searching XML Documents Using the INPATH Operator  
Path Search Feature Syntax Description

Simple Tag Searching

virginia INPATH (//STATE)

Finds all documents where the word "virginia" appears between <STATE> and </STATE>. The STATE element can appear at any level of the document structure.

Case-sensitivity

virginia INPATH (STATE)

virginia INPATH (State)

Tags and attribute names in path searching are case-sensitive. virginia INPATH STATE -- finds <STATE>virginia</STATE> but NOT <State>virginia</State>. To find the latter you must do virginia INPATH State.

Top-Level Tag Searching

virginia INPATH (Legal)

virginia INPATH (/Legal)

For example, the following query finds Quijote where it occurs between <order> and </order>:

select id from library_catalog where contains(text,'Quijote INPATH(order)') > 0;

Here <order> must be the top level tag.

Finds all documents where "virginia" appears in a Legal element which is the top-level tag.'Legal' MUST be the top-level tag of the document.'virginia' may appear anywhere in this tag regardless of other intervening tags. For example:

<?xml version="1.0" standalone="yes"?>

<!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> -->

<Legal>

<CourtFiling>

<Filing ID="f001" FilingType="Civil">

<LeadDocument>

<CaseCaption>

<CourtInformation>

<Location>

<Address>

<AddressState>VIRGINIA</AddressState>

</Address> ... </Legal>

Any Level Tag Searching

virginia INPATH (//Address)

For example, a double slash indicates "any number of levels" down. The following query finds Quijote inside a <title> tag that occurs at the top level or any lower level:

select id from library_catalog

where contains(text,'Quijote INPATH(//title)') > 0;

'Virginia' can appear anywhere within an 'Address' tag, which may appear within any other tags. for example:

<?xml version="1.0" standalone="yes"?>

<!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> -->

<Legal>

<CourtFiling>

<Filing ID="f001" FilingType="Civil">

<LeadDocument>

<CaseCaption>

<CourtInformation>

<Location>

<Address>

<AddressState> VIRGINIA </AddressState>... </Legal>

Direct Parentage Path Searching

virginia INPATH (//CourtInformation/Location)

for example:

select id from library_catalog where contains(text,'virginia INPATH(order/item)') > 0;

Finds all documents where "virginia" appears in a Location element which is a direct child of a CourtInformation element. For example:

<?xml version="1.0" standalone="yes"?>

<!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> -->

<Legal>

<CourtFiling>

<Filing ID="f001" FilingType="Civil">

<LeadDocument>

<CaseCaption>

<CourtInformation>

<Location>

<Address>

<AddressState> VIRGINIA </AddressState>

</Address>... </CourtInformation>

Single-Level Wildcard Searching

virginia INPATH(A/*/B)

'virginia INPATH (//CaseCaption/*/Location)'

Finds all documents where "virginia" appears in a B element which is a grandchild of an A element. For instance, <A><D><B>virginia</B></D></A>. The intermediate element does not need to be an indexed XML tag. For example:

<?xml version="1.0" standalone="yes"?>

<!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> -->

<Legal>

<CourtFiling>

<Filing ID="f001" FilingType="Civil">

<LeadDocument>

<CaseCaption>

<CourtInformation>

<Location>

<Address>

<AddressState>VIRGINIA</AddressState>...

</Legal>

Multi-level Wildcard Searching

'virginia INPATH (Legal/*/Filing/*/*/CourtInformation)'

'Legal' must be a top-level tag, and there must be exactly one tag-level between 'Legal' and 'Filing', and two between 'Filing' and 'CourtInformation'. 'Virginia' may then appear anywhere within 'CourtInformation'. For example:

<?xml version="1.0" standalone="yes"?>

<!-- <?xml-stylesheet type="text/xsl" href="./xsl/vacourtfiling(html).xsl"?> -->

<Legal>

<CourtFiling>

<Filing ID="f001" FilingType="Civil">

<LeadDocument>

<CaseCaption>

<CourtInformation>

<Location>

<Address>

<AddressState>VIRGINIA</AddressState>

</Address>

</Location>

<CourtName>

IN THE CIRCUIT COURT OF LOUDOUN COUNTY

</CourtName>

</CourtInformation>....

Descendant Searching

virginia INPATH(A//B)

Finds all documents where "virginia" appears in a B element which is some descendant (any level) of an A element.

Attribute Searching

virginia INPATH(A/@B)

Finds all documents where "virginia" appears in the B attribute of an A element. You can search within an attribute value using the syntax <tag>/@<attribute>:

select id from library_catalog where contains(text,'dvd INPATH(//item/@type)') > 0; AND and OR

You can use boolean AND and OR to combine existence or equality predicates in a test.

select id from library_catalog where contains(text,'Levy or Cervantes INPATH(//title)') >0;

Descendant/Attribute Existence Testing

virginia INPATH (A[B])

You can search for documents using the any-level tag searching:

select id from library_catalog where contains (text,'Quijote INPATH(/order/title)') > 0;

You can also use the "*" as a single level wildcard. The * matches exactly one level.:

select id from library_catalog where contains (text,'Cervantes INPATH(/order/*/author)') > 0;

Finds all documents where "virginia" appears in an A element which has a B element as a direct child.

  • virginia INPATH A[.//B] -- Finds all documents where "virginia" appears in an A element which has a B element as a descendant (any level).
  • virginia INPATH A[@B] -- Finds all documents where "virginia" appears in an A element which has a B attribute

Attribute Value Testing

Within Equality

virginia INPATH A[@B = "foo"]

That means that:

virginia INPATH (A[@B = "pot of gold"]), would, with the default lexer and stoplist, match any of the following:

<A B="POT OF GOLD">virginia</A>

By default, lexing is case-independent, so "pot" matches "POT", <A B="POT BLACK GOLD">virginia</A>

By default, "of" is a stopword, and, in a query, would match any word in that position, <A B=" Pot OF Gold ">virginia</A>

Finds all documents where "virginia" appears in an A element which has a B attribute whose value is "foo".

  • Only equality is supported as a test. Range operators and functions are not supported.
  • The left-hand-side of the equality MUST be an attribute or tag. Literals here are not allowed.
  • The right-hand-side must be a literal. Tags and attributes here are not allowed.

Within equality (See "Using INPATH Operator for Path Searching in XML Documents" ) is used to evaluate the test.

Whitespace is mainly ignored in text indexing. Again, lexing is case-independent:

<A B="pot_of_gold">virginia</A>

Underscore is a non-alphabetic character, and is not a join character by default. As a result, it is treated more or less as whitespace and breaks up that string into three words.

Example:

select id from library_catalog where contains(text,'(Bob the Builder) INPATH(//item[@type="dvd"])') > 0;

The following will not return rows:

select id from library_catalog where contains(text,'(Bob the Builder) INPATH(//item[@type="book"])') > 0;

Numeric Equality

virginia INPATH (A[@B = 5])

Numeric literals are allowed. But they are treated as text. The within equality is used to evaluate. This means that the query does NOT match. That is, <A B="5.0">virginia</A> does not match A[@B=5] where "5.0", a decimal is not considered the same as 5, an integer.

Conjunctive Testing

virginia INPATH (A[B AND C])

virginia INPATH (A[B AND @C = "foo"])...

Predicates can be conjunctively combined.

Combining Path and Node Tests

virginia INPATH (A[@B = "foo"]/C/D)

virginia INPATH(A//B[@C]/D[E])...

Node tests can be applied to any node in the path.

Using HASPATH Operator for Path Searching in XML Documents

Use the HASPATH operator to find all XML documents that contain a specified section path. HASPATH is used when you want to test for path existence. It is also very useful for section equality testing. To find all XML documents where an order has an item within it:

select id from library_catalog 
    where contains(text,'HASPATH(order/item)') > 0; 

will return all documents where the top-level tag is a order element which has a item element as a direct child.

In Oracle9i, Oracle Text introduces a new section type and new query operators which support an XPath-like query language. Indexes of type context with XML path searching are able to perform very complex section searches on XML documents. Here are more examples of path querying using INPATH and HASPATH. Assuming the following XML document:

<?xml version="1.0"?> 
  <order> 
    <item type="book">
       <title>Crypto</title>
       <author>Levi</author>
    </item> 
    <item type="dvd">
       <title> Bob the Builder</title>
       <author>Auerbach</author>
    </item> 
    <item type="book">
       <title>Don Quijote</title>
       <author>Cervantes</author>
    </item> 
  </order> 
 

In general, use INPATH and HASPATH operators only when your index has been created with PATH_SECTION_GROUP. Use of PATH_SECTION_GROUP enables path searching. Path searching extends the syntax of the WITHIN operator so that the section name operand (right-hand-side) is a path instead of a section name.

Using HASPATH Operator for Path Existence Searching


Note:

The HASPATH operator functions in a similar fashion to the existsNode() in XMLType.


Only use the HASPATH operator when your index has been created with the PATH_SECTION_GROUP. The syntax for the HASPATH operator is:

Tag Value Equality Testing

You can do tag value equality test with HASPATH:

select id from library_catalog 

where CONTAINS(text,'HASPATH (//author="Auerbach")') >0;

Building a Query Application with Oracle Text

To build a Oracle Text query application carry out the following steps:

  1. Create a section preference group. Before you create a section group and Oracle text index you must first determine the role you will need and grant the appropriate privilege. See "Oracle Text Users and Roles" , and grant the appropriate privilege.

    After creating and preparing your data, you are ready to perform the next step. See "Step 1. Create a Section Group Preference".

  2. Add sections or stop_sections
  3. Create an Oracle Text index based on the section group you created. Using the section preference created, you then create an Oracle Text index. See Building a Query Application with Oracle Text.
  4. Build your query application using the CONTAINS operator. Now you can finish building your query application. See "Building a Query Application with Oracle Text".

What Role Do You Need?

First determine the role you need. See Oracle Text Reference and "Oracle Text Users and Roles" , and grant the appropriate privilege as follows:

CONNECT system/manager
GRANT ctxapp to scott;
CONNECT scott/tiger

Step 1. Create a Section Group Preference

The first thing you must do is create a preference. This section describes how to create section preferences using PATH_SECTION_GROUP, XML_SECTION_GROUP, and AUTO_SECTION_GROUP. Table 7-3 describes the groups and summarizes their features.

Table 7-3 Comparing Oracle Text Section Groups
Section Group Description

XML_SECTION_GROUP

Use this group type for indexing XML documents and for defining sections in XML documents.

AUTO_SECTION_GROUP

Use this group type to automatically create a zone section for each start-tag/end-tag pair in an XML document. The section names derived from XML tags are case-sensitive as in XML. Attribute sections are created automatically for XML tags that have attributes. Attribute sections are named in the form attribute@tag. Stop sections, empty tags, processing instructions, and comments are not indexed. The following limitations apply to automatic section groups:

  • You cannot add zone, field or special sections to an automatic section group.
  • Automatic sectioning does not index XML document types (root elements.) However, you can define stop-sections with document type.
  • The length of the indexed tags including prefix and namespace cannot exceed 64 characters. Tags longer than this are not indexed.

PATH_SECTION_GROUP

Use this group type to index XML documents. Behaves like the AUTO_SECTION_GROUP. With this section group you can do path searching with the INPATH and HASPATH operators. Queries are case-sensitive for tag and attribute names.

How is PATH_SECTION_GROUP Similar to AUTO_SECTION_GROUP?

Documents are assumed to be XML, Every tag and every attribute is indexed by default, Stop sections can be added to prevent certain tags from being indexed, Only stop sections can be added -- ZONE, FIELD, and SPECIAL sections cannot be added, When indexing XML document collections, you do not need to explicitly define sections as Oracle automatically does this for you.

How Does PATH_SECTION_GROUP Differ From AUTO_SECTION_GROUP?

Path Searching is allowed at query time (see "Case Study: Searching XML-Based Conference Proceedings" and "You can use INPATH operator to perform path searching in XML documents. Table 7-2 summarizes the ways you can use the INPATH operator for path searching." ) with the new INPATH and HASPATH operators, Tag and attribute names are case-sensitive in queries.


Note:

If you are using the AUTO_SECTION_GROUP or PATH_SECTION_GROUP to index an XML document collection, you need not explicitly define sections since the system does this for you during indexing.


Deciding Which Section Group to Use

How do you determine which section groups is best for your application? This depends on your application. Table 7-4 lists some general guidelines to help you decide which of the XML_, AUTO_, or PATH_ section groups to use when indexing your XML documents, and why.

Table 7-4 Guidelines for Choosing XML_, AUTO_, or PATH_ Section Groups
Application Criteria XML_section_... AUTO_section_... PATH_section_...

You are using XPATH search features

--

--

Yes

You know the layout and structure of your XML documents, and you can predefine the sections on which users are most likely to search.

Yes

--

--

You do not know which tags users are most likely to search.

--

Yes

--

Query performance, in general

Fastest

Little slower than XML_section_...

Little slower than AUTO_section_...

Indexing performance, in general

Fastest

Little slower than XML_section_...

Little slower than AUTO_section_...

Index size

Smallest

Little larger than XML_section_...

Little larger than AUTO_section_...

Other features

Mappings can be defined so that tags in one or different DTDs can be mapped to one section. Good for DTD evolution and data aggregation.

Simplest. No need to define mapping, add_stop_section can be used to ignore some sections.

Designed for more sophisticated XPATH- like queries

Creating a Section Preference with XML_SECTION_GROUP

The following command creates a section group called, xmlgroup, with the XML_SECTION_GROUP group type:

EXEC ctx_ddl.create_section_group('myxmlgroup', 'XML_SECTION_GROUP');

Creating a Section Preference with AUTO_SECTION_GROUP

You can set up your indexing operation to automatically create sections from XML documents using the section group AUTO_SECTION_GROUP. Here, Oracle creates zone sections for XML tags. Attribute sections are created for those tags that have attributes, and these attribute sections are named in the form "tag@attribute."

The following command creates a section group called autogroup with the AUTO_SECTION_GROUP group type. This section group automatically creates sections from tags in XML documents.

EXEC ctx_ddl.create_section_group('autogroup', 'AUTO_SECTION_GROUP');


Note:

You can add attribute sections only to XML section groups. When you use AUTO_SECTION_GROUP, attribute sections are created automatically. Attribute sections created automatically are named in the form tag@attribute.


Creating a Section Preference with PATH_SECTION_GROUP

To enable path section searching, index your XML document with PATH_SECTION_GROUP. For example:

EXEC ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP');

Step 2. Set the Preference's Attributes

To set the preference's attributes for XML_SECTION_GROUP, use the following procedures:

To set the preference's attributes for AUTO_SECTION_GROUP and PATH_SECTION_GROUP, use the following procedures:

There are corresponding CTX_DDL.DROP sections and CTX_DDL.REMOVE section commands.

2.1 XML_SECTION_GROUP: Using CTX_DDL.add_zone_section

The syntax for CTX_DDL.add_zone_section follows:

CTX_DDL.Add_Zone_Section (
 group_name      => 'my_section_group' /* whatever you called it in the 
preceding section */
 section_name    => 'author' /* what you want to call this section */
 tag             => 'my_tag' /* what represents it in XML */ );

where 'my_tag' implies opening with <my_tag> and closing with </my_tag>.

add_zone_section Guidelines

add_zone_section guidelines are listed here:

2.2 XML_SECTION_GROUP: Using CTX_DDL.Add_Attr_Section

The syntax for CTX_DDL.ADD_ATTR_SECTION follows:

CTX_DDL.Add_Attr_Section ( /* call this as many times as you need to describe
                             the attribute sections */
 group_name      => 'my_section_group' /* whatever you called it in the 
preceding section */
 section_name    => 'author' /* what you want to call this section */
 tag             => 'my_tag' /* what represents it in XML */ );

where 'my_tag' implies opening with <my_tag> and closing with </my_tag>.

Add_Attr_Section Guidelines

Add_Attr_Section guidelines are listed here:

ADD_ATTR_SECTION adds an attribute section to an XML section group. This procedure is useful for defining attributes in XML documents as sections. This enables searching XML attribute text with the WITHIN operator.

The section_name:

The tag specifies the name of the attribute in tag@attr format. This is case-sensitive.


Note:

In the ADD_ATTR_SECTION procedure, you can have many tags all represented by the same section name at query time. Explained in another way, the names used as the arguments of the keyword WITHIN can be different from the actual XML tag names. That is many tags can be mapped to the same name at query time. This feature enhances query usability.


2.3 XML_SECTION_GROUP: Using CTX_DDL.Add_Field_Section

The syntax for CTX_DDL.Add_Field_Section follows:

CTX_DDL.Add_Field_Section (
 group_name     => 'my_section_group' /* whatever you called it in the preceding 
section */
 section_name   => 'qq' /* what you want to call this section */
 tag            => 'my_tag' /* what represents it in XML */ );
 visible        => TRUE or FALSE );

Add_Field_Section Guidelines

Add_Field_Section guidelines are listed here:

How Attr_Section Differs from Field_Section

Attribute section differs from Field section in the following ways:

2.5 AUTO_SECTION_GROUP: Using CtX_DDL.Add_Stop_Section

CtX_DDL.Add_Stop_Section (
group_name     => 'my_section_group' /* whatever you called it in the preceding 
section */
section_name   => 'qq' /* what you want to call this section */ );

Step 3. Create an Index Using the Section Preference Created in Step 2

Create an index depending on which section group you used to create a preference:

Creating an Index Using XML_SECTION_GROUP

To index your XML document when you have used XML_SECTION_GROUP, you can use the following statement:

CREATE INDEX myindex ON docs(htmlfile) INDEXTYPE IS ctxsys.context 
     parameters('section group xmlgroup');
See Also:

"Creating an Index Using XML_SECTION_GROUP" .

Creating an Index Using AUTO_SECTION_GROUP

The following statement creates the index, myindex, on a column containing XML files using the AUTO_SECTION_GROUP:

CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS 
('section group autogroup');

Creating an Index Using PATH_SECTION_GROUP

To index your XML document when you have used PATH_SECTION_GROUP, you can use the following statement:

CREATE INDEX myindex ON xmldocs(xmlfile) INDEXTYPE IS ctxsys.context PARAMETERS 
('section group xmlpathgroup');

See Also:

Oracle Text Reference for detailed notes on CTX_DDL.

Example 7-3 Creating an Index Using XML_SECTION_GROUP

EXEC ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');

/* ADDING A FIELD SECTION */
EXEC ctx_ddl.Add_Field_Section /* THIS IS KEY */
  (  group_name   =>'my_section_group',
     section_name =>'author',/* do this for EVERY tag used after "WITHIN" */
     tag          =>'author'
  );

EXEC ctx_ddl.Add_Field_Section /* THIS IS KEY */
      (  group_name   =>'my_section_group',
         section_name =>'document',/*do this for EVERY tag after "WITHIN" */
         tag          =>'document'
      );

  ...
/
/* ADDING AN ATTRIBUTE SECTION */
EXEC ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');

/* The more sections you add to your index, the longer your search will take.*/
/* Useful for defining attributes in XML documents as sections. This allows*/
/* you to search XML attribute text using the WITHIN operator.*/ 
/*  The section name:
/* ** Is used for WITHIN queries on the attribute text. 
   ** Cannot contain the colon (:) or dot (.) characters. 
   ** Must be unique within group_name. 
   ** Is case-insensitive. 
   ** Can be no more than 64 bytes. 
   **  The tag specifies the name of the attribute in tag@attr format. This is
       case-sensitive. */
/* Names used as arguments of the keyword WITHIN can be different from the
   actual XML tag names. Many tags can be mapped to the same name at query
   time.*/ 
/* Call CTX_DDL.Add_Zone_Section for each tag in your XML document that you need 
to search on. */

EXEC ctx_ddl.add_zone_section('myxmlgroup', 'mybooksec', 'mydocname(book)');

CREATE INDEX my_index ON my_table ( my_column )
  INDEXTYPE IS ctxsys.context
  PARAMETERS ( 'SECTION GROUP my_section_group' );

SELECT my_column FROM my_table
  WHERE CONTAINS(my_column, 'smith WITHIN author') > 0;

Step 4. Create Your Query Syntax

See the section, "Querying with the CONTAINS Operator" for information about how to use the CONTAINS operator in query statements.

Querying Within Attribute Sections

You can query within attribute sections when you index with either XML_SECTION_GROUP or AUTO_SECTION_GROUP as your section group type.

Assume you have an XML document as follows:

<book title="Tale of Two Cities">It was the best of times.</book>

You can define the section title@book as the attribute section title. You can do so with the CTX_DLL.Add_Attr_Section procedure or dynamically after indexing with ALTER INDEX.


Note:

When you use the AUTO_SECTION_GROUP to index XML documents, the system automatically creates attribute sections and names them in the form attribute@tag.


If you use the XML_SECTION_GROUP, you can name attribute sections anything with CTX_DDL.ADD_ATTR_SECTION.

To search on Tale within the attribute section title, issue the following query:

WHERE CONTAINS (...,'Tale WITHIN title', ...)

When you define the TITLE attribute section as such and index the document set, you can query the XML attribute text as follows:

... WHERE CONTAINS (...,'Cities WITHIN booktitle', ....)...

When you define the AUTHOR attribute section as such and index the document set, you can query the XML attribute text as follows:

... WHERE 'England WITHIN authors'

Example 7-4 Querying an XML Document

This example does the following:

  1. Creates and populates table res_xml
  2. Creates an index, section_group, and preferences
  3. Paramaterizes the preferences
  4. Runs a test query against res_xml
    drop table res_xml;
    
    CREATE TABLE res_xml (
      pk            NUMBER PRIMARY KEY ,
      text          CLOB
      ) ;
    
    insert into res_xml values(111,
      'ENTITY chap8 "Chapter 8, <q>Keeping it Tidy: the XML Rule Book </q>"> this is 
    the document section');
    commit;
    
    ---
    --- script to create index on res_xml
    ---
    
    --- cleanup, in case we have run this before
    DROP INDEX res_index ;
    EXEC CTX_DDL.DROP_SECTION_GROUP ( 'res_sections' ) ;
    
    --- create a section group
    BEGIN
      CTX_DDL.CREATE_SECTION_GROUP ( 'res_sections', 'XML_SECTION_GROUP' ) ;
      CTX_DDL.ADD_FIELD_SECTION ( 'res_sections', 'chap8', '<q>') ;
    END ;
    /
    
    begin
      ctx_ddl.create_preference
        (
          preference_name => 'my_basic_lexer',
          object_name     => 'basic_lexer'
        );
      ctx_ddl.set_attribute
        (
          preference_name => 'my_basic_lexer',
          attribute_name  => 'index_text',
          attribute_value => 'true'
        );
      ctx_ddl.set_attribute
        (
          preference_name => 'my_basic_lexer',
          attribute_name  => 'index_themes',
          attribute_value => 'false');
    end;
    /
    
    CREATE INDEX res_index
      ON res_xml(text)
      INDEXTYPE IS ctxsys.context 
      PARAMETERS ( 'lexer my_basic_lexer SECTION GROUP res_sections' ) ;
    
    

Test the preceding index with a test query, such as:

SELECT pk FROM res_xml WHERE CONTAINS( text, 'keeping WITHIN chap8' )>0 ;

Example 7-5 Creating an Index and Performing a Text Query

drop table explain_ex; 
 
create table explain_ex 
  ( 
    id        number primary key,
    text      varchar(2000)
  ); 
 
insert into explain_ex ( id, text )
  values ( 1, 'thinks thinking thought go going goes gone went' || chr(10) ||
              'oracle orackle oricle dog cat bird'              || chr(10) ||
              'President Clinton' ); 
insert into explain_ex ( id, text )
  values ( 2, 'Last summer I went to New England'               || chr(10) ||
              'I hiked a lot.'                                  || chr(10) ||
              'I camped a bit.' ); 
commit;

Example 7-6 Text Query Using "ABOUT" in the Text Query Expression

Set Define Off
select text 
  from explain_ex
  WHERE CONTAINS ( text,
  '( $( think & go ) , ?oracle ) & ( dog , ( cat & bird ) ) & about(mammal
                                                    during Bill Clinton)' ) > 0;
select text 
  from explain_ex
  WHERE CONTAINS ( text, 'about ( camping and hiking in new england )'  ) > 0;

Example 7-7 Creating an Index Using AUTO_SECTION_GROUP

ctx_ddl_create_section_group('auto', 'AUTO_SECTION_GROUP');


CREATE INDEX myindex ON docs(xmlfile_column) INDEXTYPE IS ctxsys.context PARAMETERS ('filter ctxsys.null_filter SECTION GROUP auto'); SELECT xmlfile_column FROM docs WHERE CONTAINS (xmlfile_column, 'virginia WITHIN title')>0;

Example 7-8 Creating an Index Using PATH_SECTION_GROUP

EXEC ctx_ddl.create_section_group('xmlpathgroup', 'PATH_SECTION_GROUP');


CREATE INDEX myindex ON xmldocs(xmlfile_column) INDEXTYPE IS ctxsys.context PARAMETERS ('section group xmlpathgroup');

SELECT xmlfile_column FROM xmldocs ... WHERE CONTAINS (column, 'Tale WITHIN title@book')>0;

Example 7-9 Using XML_SECTION_GROUP and add_attr_section to Aid Querying

Consider an XML file that defines the BOOK tag with a TITLE attribute as follows:

<BOOK TITLE="Tale of Two Cities"> 
It was the best of times. </BOOK> 
<Author="Charles Dickens">
Born in England in the town, Stratford_Upon_Avon </Author>

Recall the CTX_DDL.ADD_ATTR_SECTION syntax is:

CTX_DDL.Add_Attr_Section ( group_name, section_name, tag );

To define the title attribute as an attribute section, create an XML_SECTION_GROUP and define the attribute section as follows:

ctx_ddl_create_section_group('myxmlgroup', 'XML_SECTION_GROUP');
ctx_ddl.add_attr_section('myxmlgroup', 'booktitle', 'book@title');
ctx_ddl.add_attr_section('myxmlgroup', 'authors', 'author');
end;

Note:
  • Oracle knows what the end tags look like from the group_type parameter you specify when you create the section group. The start tag you specify must be unique within a section group.
  • Section names need not be unique across tags. You can assign the same section name to more than one tag, making details transparent to searches.

Presenting the Results of Your Query

An Oracle Text query application enables viewing documents returned by a query. You typically select a document from the hit list and then your application presents the document in some form.

With Oracle Text, you can render a document in different ways. For example, with the query terms highlighted. Highlighted query terms can be either the words of a word query or the themes of an ABOUT query in English. This rendering uses the CTX_DOC.HIGHLIGHT or CTX_DOC.MARKUP procedures.

You can also obtain theme information from documents with the CTX_DOC.THEMES PL/SQL package. Besides these there are several other CTX_DOC procedures for presenting your query results.

INPATH does not support working with highlighting or themes.

See Also:

Oracle Text Reference for more information on the CTX_DOC package.

XMLType Indexing

The Oracle9i datatype for storing XML, XMLType, is a core database feature.

You Need Query Rewrite Privileges


Note:

These privileges are only required for Oracle9i Release 1 (9.0.1).


You can create an Oracle Text index on this type, but you need a few database privileges first:

  1. The user creating the index must have Query Rewrite privileges:
    GRANT QUERY REWRITE TO <user>
    
    

    Without this privilege, the create index will fail with:

    ORA-01031: insufficient privileges
    
    

    <user> should be the user creating the index. The database schema that owns the index, if different, does not need the grant.

  2. query_rewrite_enabled should be true, and query_rewrite_integrity should be trusted. You can add them to the init.ora file:
    query_rewrite_enabled=true
    query_rewrite_integrity=trusted
    
    

    or turn it on for the session as follows:

    ALTER SESSION SET query_rewrite_enabled=true;
    ALTER SESSION SET query_rewrite_integrity=trusted;
    
    

    Without these, queries will fail with:

    DRG-10599: column is not indexed
    
    

These privileges are needed because XMLType is really an object, and you access it through a function, hence an Oracle Text index on an XMLType column is actually a function-based index on the getclobval() method of the type. These are the standard grants you need to use function-based indexes, however, unlike function-based B-Tree indexes, you do not need to calculate statistics.


Note:

Oracle9i SQL Reference under CREATE INDEX, states:

To create a function-based index in your own schema on your own table, in addition to the prerequisites for creating a conventional index, you must have the QUERY REWRITE system privilege.

To create the index in another schema or on another schema's table, you must have the GLOBAL QUERY REWRITE privilege. In both cases, the table owner must also have the EXECUTE object privilege on the function(s) used in the function-based index.

In addition, in order for Oracle to use function-based indexes in queries, the QUERY_REWRITE_ENABLED parameter must be set to TRUE, and the QUERY_REWRITE_INTEGRITY parameter must be set to TRUSTED.


System Parameter is Set to the Default, CTXSYS.PATH_SECTION_GROUP

When an XMLType column is detected, and no section group is specified in the parameters string, the default system examines the new system parameter DEFAULT_XML_SECTION, and uses the section group specified there. At install time this system parameter is set to CTXSYS.PATH_SECTION_GROUP, which is the default path sectioner.

The default filter system parameter for XMLType is DEFAULT_FILTER_TEXT, which means that the INSO filter is not engaged by default.

XMLType Indexes Work Like Other Oracle Text Indexes

Other than the database privileges and the special default section group system parameter, indexes on XMLType columns work like any other Oracle Text index.

Example 7-10 Creating a Text Index on XMLType Columns

Here is a simple example:

connect ctxsys/ctxsys
GRANT QUERY REWRITE TO xtest;
connect xtest/xtest

CREATE TABLE xtest(doc sys.xmltype);
INSERT INTO xtest VALUES (sys.xmltype.createxml('<A>simple</A>'));

CREATE INDEX xtestx ON xtest(doc)
   INDEXTYPE IS ctxsys.context;
ALTER SESSION SET query_rewrite_enabled = true;
ALTER SESSION SET query_rewrite_integrity = trusted;

SELECT a.doc.getclobval() FROM xtest a
   WHERE CONTAINS (doc, 'simple INPATH(A)')>0;

Using Oracle Text with Oracle XML DB

Creating an Oracle Text Index on an UriType Column

UriType columns can be indexed natively in Oracle9i database using Oracle Text. No special datastore is needed.

Example 7-11 Creating an Oracle Text Index on a UriType Column

For example:

CREATE TABLE table uri_tab ( url sys.httpuritype);

INSERT INTO uri_tab VALUES
    (sys.httpuritype.createUri('http://www.oracle.com'));

CREATE INDEX urlx ON uri_tab(url) INDEXTYPE IS ctxsys.context;

SELECT url FROM uri_tab WHERE CONTAINS(url, 'Oracle')>0;

Table 7-5 lists system parameters used for default preference names for Oracle Text indexing, when the column type is UriType:

Table 7-5 rUriType Column Default Preference Names for Oracle Text Indexing
URIType Column Default Preference Names

DATASTORE

DEFAULT_DATASTORE

FILTER

DEFAULT_FILTER_TEXT

SECTION GROUP

DEFAULT_SECTION_HTML

LEXER

DEFAULT_LEXER

STOPLIST

DEFAULT_STOPLIST

WORDLIST

DEFAULT_WORDLIST

STORAGE

DEFAULT_STORAGE

Querying XML Data: Use CONTAINS or existsNode()?

Oracle9i Release 1(9.0.1) introduced the Oracle Text PATH_SECTION_GROUP, INPATH(), and HASPATH() query operators. These allow you to do XPath-like text query searches on XML documents using the CONTAINS operator. CONTAINS, however, supports only a subset of XPath functionality. Also, there are important semantic differences between the CONTAINS operator and the existsNode() function.

The existsNode, extract() and extractValue() SQL functions (and the corresponding member functions of XMLType) provide full XPath support. This release of Oracle9i also introduces new extension functions to XPath to support full text searches.


Note:

This release does not support theme querying for Oracle Text CONTAINS() and existsNode() searching.


Table 7-6 lists and compares CONTAINS() and existsNode() features for searching XMLType data.

Table 7-6 Using CONTAINS() and existsNode() to Search XMLType Data  
Feature CONTAINS() existsNode()

XPath Conformance

--

--

Predicate Support

--

--

  • String equality

Y

Y

  • Numerical equality

N

Y

  • Range Predicates

N

Y

  • XPath functions

N

Y

  • Spaces

N

Y

  • Namespaces

N

Y

  • Value case sensitivity