Index Lucene
The basis for searching in the Mercury DB system is the [Lucene] Index (https://lucene.apache.org). This article will present the principles of creating field names, which are the basis for implementing search queries. It will also present the division of fields according to their type, category (division according to field origin) and search method (division according to field type). The Lucene Index is very flexible and allows for the creation of specialized queries that can be adapted to various user needs. It is worth noting that the Lucene Index is used not only in the Mercury DB system, but also in many other database systems and Internet search engines.
Due to the development of the system and new requirements, the Lucene index field names have been developed in two models: Model 2.0 and Model 3.0. The choice of the field naming model is defined in the system configuration parameters. The naming model used is defined by the mercury.lucene.model.version
parameter of the system configuration stored in the mercury.properties
file. The parameter takes one of two values: 2
or 3
.
Index document structure​
The Lucene index document in the Mercury DB system is based on data stored in a relational database.
The Lucene index model is defined during the Mercury DB system installation process. It is worth noting that this model is closely related to the system architecture and its functionalities. In the event of a change in the index model, it is necessary to rebuild the Lucene index and make appropriate changes in the applications performing data search tasks
The entity diagram below presents the individual elements of the document, which are stored in the relational database. Each entity is represented by corresponding fields in the Lucene index document (see Fixed/Predefined Field Names).
Storing a document in an index​
The architecture of the data stored in a relational database means that one Lucene index document corresponds to one entry (row) in the table representing the Case entity. The object-oriented nature of the stored data introduces the need to build relationships between indexed documents. To explain this in more detail, let's analyze the following case of an example object of a complex case definition named EliAddress:
We have a parent case (EliClient, parent) and a child case (EliAddress, child). Their data is stored as separate rows in the table. The relationship that exists between them (the clientAddress field in the parent case) is stored in the relational database as a row of the Case2SubCase entity. To reflect this relationship in indexed documents, the HgDB indexing engine adds appropriate fields to both the parent and child case documents.
A field will be added to the parent case document (EliClient, parent) with a name created according to the rule: <field_name>_<field_name_with_case_id>
, i.e. in our case, using the 3.0 model naming convention, clientAddress_mrc_Case_id
with a value corresponding to the child case identifier. A field is created that is the equivalent of a foreign key in a relational database. This will allow the Mercury DB (HgDB) 3.0 engine to execute the appropriate binding query, search for a child case with the appropriate identifier.
Two multi-valued fields will be added to the child case document (EliAddress, child):
parentFields
- a field with the name of the binding field, to which in our case the valueclientAddress
will be addedparentTypes
- a field with the name of the parent case type, to which in our case the valueEliClient
will be added
This will allow for optimization of queries that allow finding the parent case.
Rules for building field names​
In the Lucene index document, we can distinguish two main types of fields: fixed/predefined fields and dynamic fields. Different rules for creating Lucene index field names have been created for each type.
Fixed/predefined fields​
Fixed/predefined fields are fields that are defined in the Lucene index model and cannot be modified by users. These are fields that are necessary for the system to function properly and are used to store basic information about documents. Examples of such fields are: mrc_Case_id
, mrc_Case_type
, mrc_Case_status
.
A division of fields has been introduced, due to their origin, in the form of field categories. Categories represent entities, and their names are the names of entities in which the fields appear.
To distinguish this type of field from case object fields (dynamic fields), in Model 3.0 the naming of all constant fields has been provided with an additional prefix mrc_
.
Below are the rules for naming fields based on their category assignment:
- TypeCode - fields representing data regarding the definition of the case type code or the document type associated with the case. Depending on the naming model used in the system, a prefix is ​​added to the basic names of entity fields:
- for case object types, this is
typeTypeCode
for the 2.0 model and typeCode for the 3.0 model. Example field name for the 3.0 model:mrc_typeCodeValue
. - for document object types, this is
c2docTypeTypeCode
for the 2.0 model andc2docTypeCode
for the 3.0 model.
- for case object types, this is
- TypeKind - fields representing data regarding the type of case type or the document associated with the case. Depending on the naming model used in the system, a prefix is ​​added to the basic names of entity fields:
- for case types it is
typeTypeKind
for the 2.0 model and typeKind for the 3.0 model. Example field name for the 3.0 model:mrc_typeKindValue
. - for document types it is
c2docTypeTypeKind
for the 2.0 model andc2docTypeKind
for the 3.0 model.
- for case types it is
- TypeCase - fields representing data regarding the type definition (type version) of the case or the document associated with the case. Depending on the naming model used in the system, a prefix is ​​added to the basic names of entity fields:
- for case object types it is
type
. Example field names for the 3.0 model:mrc_typeRootVersionContextID
,mrc_typeTypeName
. - for document object types it is
c2docType
.
- for case object types it is
- Source - fields representing the source of the case or document related to the case. A prefix is ​​added to the basic names of entity fields:
- for case objects it is
grSrc
. - for document objects it is
c2docSrc
. Example of a field name for the 3.0 model:mrc_c2docSrcName
.
- for case objects it is
- KtmNumber - fields representing data of the KTM index symbol related to the case (Polish abbreviation Kod Towarowo-Materiałowy1, describing the code of the fixed asset, resource).
- The field name prefix is ​​
grKtm
. - Example of a field name for the 3.0 model:
mrc_grKtmDescription
.
- The field name prefix is ​​
- GroupCase - fields representing data of the group of cases. The field name prefix is ​​
gr
. - Participant - fields representing case participant data (case participant data, clients affected by the case).
- Field name prefixes:
grParticipant
,grClient
,grApplicant
. These are multi-valued fields. - Example of a field name for the 3.0 model: "mrc_grClientIdentity".
- Field name prefixes:
- Case - fields representing predefined fields related to the case, also called "case header fields" - see the description of the [CaseHeader case header] object (/docs/API/CaseHeader). No additional prefix is ​​added to the field names. Example of a status field name for the 3.0 model:
mrc_status
. - QuickTask - fields representing a quick task2 associated with the case. The field name prefix is ​​
qt
. Example of a field name for the 3.0 model:mrc_qtReplyText
. - Comment - fields representing comments associated with the case. The field name prefix is ​​
comm
. Example of a field name for the 3.0 model:mrc_commContent
. - CaseDocument - fields representing documents associated with the case. The field name prefix is ​​
c2doc
. Example of a field name for the 3.0 model:mrc_c2docSubject
. - InitStatus - fields representing the initial status of a document associated with the case. The field name prefix is ​​
c2docInitStat
. Example of a field name for the 3.0 model:mrc_c2docInitStatName
.
Dynamic fields - object/case fields​
Dynamic fields are fields that are defined by users and can be modified during system operation. These are fields that are specific to a given case and can contain different information depending on the needs of users. Examples of such fields are: clientName
, caseDescription
, documentDate
.
All dynamic fields belong to the Case category.
We have three types of dynamic fields for which different rules for creating their names have been defined:
- Basic fields - fields representing simple fields of the case object. Field names are as they are defined in the object.
- Conflicted fields - fields representing simple fields of the case object, the names of which are in conflict with the names of fixed/predefined fields. Depending on the field naming model used, we have the following rules for creating a field name:
For the 2.0 naming model, due to the lack of the mrc_
prefix in the names of fixed fields, the problem of field name conflicts occurred very often. For the 3.0 model, thanks to the mrc_
prefix, the occurrence of field naming conflicts has been minimized to a minimum. Do not use field names that appear in the set of fixed/predefined fields.
-
for the 2.0 model, the prefix
custom_
is added to the field name. Example: an object field namedstatus
will take the namecustom_status
. -
for the 3.0 model, the suffix
_custom
is added to the field name. Example: an object field namedmrc_status
will take the namemrc_status_custom
. -
Foreign keys associated with the
Case2SubCase
entity - fields indicating the links between main (parent) and dependent (sub) cases. Such fields are built according to the following rule:- for the 2.0 model:
<field_name>_luceneDocId
. Example:address_luceneDocId
. - for the 3.0 model:
<field_name>_mrc_Case_id
. Example:address_mrc_Case_id
.
- for the 2.0 model:
Unclassified permanent fields​
These are fields that are used by the internal mechanisms of the Mercury DB (HgDB) 3.0 database, but they can also be used to search for cases in Lucene index queries.
Depending on the naming model used, field names are built according to the following rules:
- for model 2.0, the field name is as is.
- for model 3.0, the prefix
mrc_
is added to the field name. Example:mrc_parentTypes
.
Field list:
Field name | Description |
---|---|
parentFields | a string type field (String ), added to the child case document, multi-valued, contains the names of the parent case fields to which it is associated. The field supports building links between indexed documents - links between main (parent) and dependent (subordinate) cases. An example of a field value is the field name: address |
parentTypes | a string type field (String ), added to the child case document, multi-valued, contains the names of the parent case types to which it is associated. The field supports building links between indexed documents - links between main (parent) and dependent (subordinate) cases. An example of a field value is the case type name: FsmService |
Lucene Index Field Types​
Since the field type affects the search mechanism used by the index, an alternative concept for the field type is the concept of search type.
Basic field types (search types):
Field Type | Description |
---|---|
TextField | Text field, full-text search, case insensitive. |
StringField | Simple string field, one expression, word. Most often used to define values ​​of the code, acronym or identifier type. Case-sensitive search. |
LongField | Numeric field, integer, long. Searching for numbers, ranges of numbers. |
DateField | Date field. During indexing, the value of the date field is converted to a value of milliseconds, to the type LongField . Allows to build a search range. |
IntField | Numeric field, integer, "short". |
FloatField | Numeric field, floating point. |
DoubleField | Numeric field, floating point. |
CompositeIdField | Field representing the values ​​of the entity's composite keys. The value of such a field is converted to the StringField type. Example: the CaseDocument entity has the id field of the composite key of the CaseDocumentPK type. The value of such a field is indexed in the form: "{\"caseId\":\"" + getCaseId() + "\", \"objectId\":\"" + objectId + "\", \"versionSeriesId\":\"" + versionSeriesId + "\"}" |
SubQuery | Composite fields to which a subcase is assigned. To use this field in the search, its name should be used as a prefix separated by a dot, e.g. address.mrc_Case_id . |
Fixed/Predefined Field Names​
Fixed fields from entities representing data stored in a relational database.