Streamlining Data Search in Teamcenter: Indexing Types, Utilities, & Solr Integration
Indexing
Indexing is the process of organizing your objects, metadata, and file content to enable users to efficiently search for relevant information in Active Workspace. As part of this blog we would like to get in to details of Indexing components and how they work, we will have quick walk through of the mentioned below
Indexing Components
TcFTSIndexer
File Content Indexing
Indexing Types
Indexing Utilities
Indexing business objects and properties and Files
Indexing Components
Indexing Engine
Teamcenter uses Solr enterprise search platform for indexing.
SOLR (Searching on Lucene w/Replication) is an open-sourced java base search platform which uses Lucene java search library.
It stores indexed Teamcenter data for global search in Active Workspace.
Selected/Configured product data is indexed in Solr.
The master product data is not stored in Solr. It is always loaded from Teamcenter.
Indexer
Installs a four-tier SOA client that exports Teamcenter data for merging into Solr.
The indexer manages overall indexing processes.
The TcFTSIndexer (Teamcenter Full Text Search Indexer) is a SOA client that exports Teamcenter data for importing into the Solr database.
TcFTSIndexer comprises the Query, Export, Transform, and Load steps that run in sequential order.
TcFTSIndexer manages the initial indexing for object data.
You can then schedule synchronization to run periodically for subsequent updates to object data or structure data indexes.
There are two modes for installing the Indexer: Standalone for object data and Dispatcher-based for Active content structures.
TcFTSIndexer:
Is an SOA client that connects to Teamcenter to extract data and index the data into Solr.
Allows modification of any existing steps and flows to meet customer requirements.
Can be customized to extract external system data and index into Solr.
Provides utilities that can be used in step customization.
TcFTSIndexer comprises the Query, Export, Transform, and Load steps that run in sequential order.
Types, Flows, and Steps
TcFTSIndexer can run types, flows, and steps.
Types: Represent the different integrations or customizations into TcFTSIndexer, for example, object data and structure. Types contain flows.
Flows: Represent the supported operations for a given type. For example, some flows for object data (ObjData) are clear, index, recover, and synchronization. Flows contain steps that are chained together.
Steps:
Contain methods that define a certain behavior.
Each step should have the input and outputs defined.
Steps are run in sequence as defined in a flow.
Output of one step becomes the input of the next step.
For example, the object data (ObjData) index flow has query, TIE export, transform, and load steps.
There are three types of steps:
Simple step: Runs a step based on the input and returns the output data for the next step to process.
Split step: Splits a list of input data into multiple simple steps based on the size of the list
Aggregate step: Waits on all the split steps to finish and combines all the output data from the processing of split steps.
What is file content indexing?
Indexing file contents allows your users to search quickly for included information. Objects without associated datasets are always indexed using a synchronous process. For any associated datasets, you can choose to use a synchronous or asynchronous indexing flow during installation.
Indexing Types
Synchronous Indexing Flow For File Contents And Objects
In Synchronous Indexing TcFTSIndexer indexes Teamcenter object metadata and file contents together in one sequential flow.
Query: The TcFTSIndexer connects to Teamcenter and does the initial validation. It creates a new thread that connects to Teamcenter and gets the list of UIDs.
TIE Export:The TcFTSIndexer reads the output UID file and chunks the data into a manageable size, and then calls TIE export by connecting to the pool manager. This step creates the Teamcenter XML file.
Transform: The TcFTSIndexer converts the Teamcenter XML file to a Solr input XML file with the security read expressions. This action also identifies dataset objects
Solr Loader: The TcFTSIndexer loads the Solr input XML file into Solr.
The TcFTSIndexer connects to the server manager and calls to confirm export.
When to use Synchronous Indexing?
In following scenarios Synchronous Indexing is preferred/suggested,
File indexing is not involved.
Only small number of files to be indexed.
Hardware resources are limited.
Asynchronous Indexing Flow For File Contents And Objects
In Asynchronous Indexing file contents are indexed asynchronously. Dispatchers manage file content indexing tasks in parallel with object indexing tasks. Metadata from non-dataset objects is always indexed synchronously and is searchable by users while file content indexing is completing in the background.
In asynchronous indexing, groups datasets identified in the transform step by dataset type and metadata properties if the metadata indexing process is successful. This action also submits groups of datasets to Dispatcher using DispatcherRequest objects.
Extracts and indexes file contents associated with datasets:
Dispatcher client reads the DispatcherRequest objects and submits the requests to Scheduler.
Scheduler routes the request to file content indexers using Dispatcher Module.
Each file content indexer downloads the files associated with datasets and extracts the contents using their extractor.
Different indexers are invoked depending on the type of dataset.Adds the extracted contents to the metadata in Solr
Informs Teamcenter about indexing status for the dataset.
When to use Asynchronous Indexing?
In following scenarios Asynchronous Indexing can be used,
Indexing involves CAD file contents.
To separate metadata and file content indexing.
To balance the load occurred during the file content indexing.
Avoid long waiting time for indexing to finish for bigger files.
Indexing Utilities
1. TcSchemaToSolrSchemaTransform
Converts the TC Schema to SOLR Schema
When we make any business object, properties to be indexed TC Tem deployment automatically runs this command.
Syntax: call %SOLR_HOME%\TcSchemaToSolrSchemaTransform %TC_DATA%\ftsi\solr_schema_files
Note: In few older version of TC this utility needs to be executed explicitly post BMIDE deployment.
2. awindexerutil
Refreshes indexed objects for any changes to them and the synchronization flow will refresh the index for those objects.
The awindexerutil utility marks those objects to be picked up during the next synchronization flow batch.
This allows to update the already indexed data without the downtime or a full index flow.
To refresh your indexed data for only the delta of changes since the last completed synchronization.
To take care of indexing the changes related to types/properties added/removed using the delta flow.
Running awindexerutil does not interfere with current synchronization flows.
Run the utility from the TC_ROOT\bin directory (TC_BIN if it's set).
Syntax: awindexerutil -u=user-id -p=password -g=group [-refresh] [-delta [-dryrun] [-daterange]] -h
3. runTcFTSIndexer
Indexes data into the Solr indexing engine
Run this command from the FTS_INDEXER_HOME directory, for example, TC_ROOT\TcFTSIndexer\bin.
Syntax: runTcFTSIndexer -debug -maxconnections -status -stop -service -shutdown - task=[objdata | multisite | structure | fourgd]:flow-action -h
4. aw_search_config_manager
Manages mapping configurations used when customizing asynchronous file content indexing.
Mapping configurations specify how data extracted from non-standard files, such as CAD files, are indexed into Solr.
Run the utility from the TC_ROOT\bin directory. (You can also use the TC_BIN directory if it is configured.)
Syntax: aw_search_config_manager -u=user-ID {-p=password | -pf=password-file} -g=group
[-import -dir= config-directory]
[-export -config_id= config-id -output= output-file-path]
[-remove -config_id= config-id]
[-list]
[-h]
5. solrCloudSetUp
If you installed or upgraded to SolrCloud using Deployment Center, you can migrate to SolrCloud using the SolrCloud utility.
Running this utility sets up ZooKeeper, creates configsets and collections, sets up basic authentication, and copies your data so that a full index is not required.
Syntax: solr-version.solrCloudSetUp.bat -migrate -u=user name -p=password -copy_data
Indexing business objects and properties and Files
Select custom business object and make the value true for ‘Awp0SearchIsIndexed’ business object constant.
Select the custom properties which you want to index and make the value true for ‘Awp0SearchIsIndexed’ property constant
Update SOLR Schema: This automatically happens when the data-model changes are deployed using TEM. It executes TcSchemaToSolrSchemaTransform to generate the SOLR Schema
runTcFTSIndexer.bat with options “objdata: index” or –”objdata:sync”
Login to AWC and verify the results
To enable file content indexing Update or configure the preferences mentioned below
AWS_FullTextSearch_Index_Dataset_File_Content
Enables indexing of dataset file contents for full-text search, allowing users to search within attached files such as documents and CAD files. This preference should be set at Site level. Possible values are True/False.AW_Indexable_File_Extensions
This preference specifies which file extensions should be indexed for search. This preference should be set at Site level. Adjust this list to include the file types most commonly used in your environment e.g .txt, .doc, .pdf, .prt, .dftAWS_Search_Enable_Snippets
Helps users quickly understand the context of a search term within a file by displaying relevant snippets in the search results. This preference should be set at Site level. Possible values are True/False
Post configuring this preference its recommended to restart the pool server and restart the solr process and perform full indexing.