Skip to main content

Lucene Index Queries

Info

The basic search engine in the HgDB database is the Apache Lucene index implementation. Lucene Core is a Java library that provides advanced indexing and search features, as well as spell checking, hit highlighting, and advanced analysis capabilities. This article will describe the rules for creating search and aggregate queries that apply in Mercury DB (HgDb) 3.0.

Search queries​

Search queries are subject to the general rules for creating queries described on the Apache Lucene - Query Parser Syntax page.

Info

In order to make it easier for developers to move from SQL standard query clauses to index-specific nomenclature, the mechanism of AND and OR operators has been expanded, allowing potentially analogous creation of complex conditions in queries.

Search services

To perform case search tasks, it is recommended to use the REST service CaseSearchExtRest#sarchByQuery(POST) and its equivalents in the SOAP implementation and Spring Remoting RMI.

AND operator​

Syntax
<condition1> AND <condition2>

The AND operator transforms the search criterion into two MUST conditions: +<condition1> +<condition2>

Example: The query conditions mrc_Case_id:12345 AND mrc_status:A will be transformed to the Lucene clause: +mrc_Case_id:12345 +mrc_status:A.

OR operator​

Syntax
<condition1> OR <condition2>

The OR operator transforms the search criterion into two SHOULD conditions: <condition1> <condition2>.

Example: The query conditions mrc_Case_id:12345 OR mrc_status:A will be transformed to the Lucene clause: mrc_Case_id:12345 mrc_status:A.

Complex queries​

Queries return a list of cases that were found as a result of the search.

Info

The examples used in the following query building instructions were prepared based on the Lucene index field naming in the 3.0 model (see Lucene Index).

We will describe the creation of complex queries based on an example case definition:

Example object of a complex case in JSON
{
"mrcCaseHeader": {
"typeCode": "ElixClient",
"dirty": false,
"pkPropertyName": "clientID"
},
"clientID": "22377",
"clientName": "Dobry Klient",
"clientLogo": "n/a",
"clientAddress": {
"mrcCaseHeader": {
"typeCode": "ElixAddress",
"dirty": false,
"pkPropertyName": "correlationId"
},
"correlationId": "22377",
"county": "pleszewski",
"community": "Pleszew",
"city": "Brzezie",
"street": "ul. Zawidowicka",
"buildingNumber": "5",
"status": "W budowie",
"isContract": "false",
"coordinationMeetingPlanedDate": "17-10-2018",
"orders": [
{
"mrcCaseHeader": {
"typeCode": "ElixOrder",
"dirty": false,
"pkPropertyName": null
},
"orderNr": "SC/012121/XB",
"orderType": "Umowa",
"orderDate": "17-10-2018"
},
{
"mrcCaseHeader": {
"typeCode": "ElixOrder",
"dirty": false,
"pkPropertyName": null
},
"orderNr": "SC/012122/SP",
"orderType": "Sprzedaż",
"orderDate": "18-10-2018"
}
],
"source": null
}
}

The example shows a complex case with the third level of nesting:

  • (1) the main case of the ElixClient type.
    • (2) clientAddress - a field pointing to a subordinate case of the ElixAddress type.
      • (3) orders - a field pointing to a list of subordinate cases of the ElixOrder type.
The principle of the 2nd level of nesting

In the case where we are dealing with complex cases, in which there are parent and subordinate cases, constructing queries is limited to the 2 level of nesting. Based on the analyzed example, the ElixClient complex case, as a rule, will not be found by building search criteria resulting from the orders (ElixOrder) reference.

To create a compound query, you should add a prefix to the child case field names based on the following rule:

<parent_case_field_name>.<child_case_field_name> e.g. clientAddress.status (see the example case).

To sum up, to find the example ElixClient parent case based on the ElixAddress child case field conditions, you should ask the following example queries.

Example 1
mrc_typeCodeValue:ElixAddress AND orders.orderType:Agreement

Example description: Find all address cases (cases of type ElixAddress) that have orders of type "Agreement" associated with them.

Example 2
clientAddress.status:"Under construction" AND clientAddress.city:"Brzezie"

Example description: Find all cases whose client address has the status "Under construction" and is located in the city "Brzezie".

Attention

The above query, although correctly constructed and will return correct results, is completely suboptimal. The query should be supplemented with conditions that narrow down the criteria of the parent case. There may be other types of complex cases in the database that have the clientAddress field, which are references to the ElixAddress type case. Therefore, to make the query more optimal, the conditions mrc_typeCodeValue:ElixClient and clientAddress.mrc_typeCodeValue:ElixAddress should be added. So the full query should look like:

mrc_typeCodeValue:ElixClient AND clientAddress.mrc_typeCodeValue:ElixAddress AND clientAddress.status:"Under construction" AND clientAddress.city:"Brzezie"

Generally, the more conditions we add to the search criterion, the more optimal the query will be.

Attention

A non-optimal construction of complex queries may result in the data we are searching not being found. One of the remedies is to set the maxResults parameter to a sufficiently large value in the context of the operation being performed, see the definition of the [Context] object (/docs/API/Context/).

PARENT clause​

A special clause for compound queries. Queries return a list of parent cases (parents) of cases that were found in the search result.

Syntax:
(<compound_query_searching_child_cases>) PARENT <parent_case_field_name>([additional_criteria_for_parent_case])

where:

  • <compound_query_searching_child_cases> - (required) this will be a query searching for a case of type ElixAddress.
  • <parent_case_field_name> - (required) this will be the name of the clientAddress field in a case of type ElixClient.
  • [additional_criteria_for_parent_case] - (optional) this will be an additional query searching for cases of type ElixClient.
Example 1
(mrc_typeCodeValue:ElixAddress) PARENT clientAddress()

Example description: Find all cases of type ElixAddress, then return parents (parent cases).

Example 2
(mrc_typeCodeValue:ElixAddress) PARENT clientAddress(mrc_typeCodeValue:ElixClient AND mrc_status:A)

Example description: Find all cases of type ElixAddress, then return parents (parent cases) that are of type ElixClient and have an active status.

Example 3
(mrc_typeCodeValue:ElixAddress AND orders.orderType:Umowa) PARENT clientAddress(mrc_typeCodeValue:ElixClient AND mrc_status:A)

Example description: Find all address cases (cases of type ElixAddress) that have orders of type "Umowa" associated with them, and then return parents (parent cases) that are of type ElixClient and have an active status.

Tip

Note that the PARENT clause practically breaks the rule of limiting the 2nd level of nesting of complex queries and gives the possibility of adding conditions at the 3 level.

Using an additional field with a date range

In some services performing case search tasks, it is possible to define an additional criterion for a date range as the value of the additionalDateRange field (see the description of the CaseSearchExtRest service). This query will be appended to the main query <complex_query_searching_for_children_cases>.

CHILD clause​

A special clause for complex queries. Queries return a list of child cases (children) of cases that were found as a result of the search.

Syntax:
(<compound_query_searching_for_parent_case>) CHILD <parent_case_field_name>()

where:

  • <compound_query_searching_for_parent_case> - (required) this will be a query searching for a case of type ElixClient.
  • <parent_case_field_name> - (required) this will be the name of the clientAddress field in the case, which is a reference to the child case.
Example 1
(mrc_typeCodeValue:ElixClient AND clientAddress.status:\"Under construction\" AND clientAddress.city:Brzezie) CHILD clientAddress()

Example description: Find all cases whose client address has the status "Under construction" and is located in the city "Brzezie" and return the child case object assigned to the clientAddress field. As a result of the query, we will receive a list of cases of type ElixAddress.

"Group By" clause​

Services using the Group By clause return aggregated data results. The set of data that will be aggregated is narrowed down with an appropriate query to the Lucene index, which we provide in a separate service argument.

Tip

To perform data aggregation tasks, it is recommended to use the REST service CaseSearchExtRest#groupByQuery(POST) and its equivalents in the SOAP and RMI implementation. See also the page with examples of using the "Group By" clause Data aggregation examples.

Note

The described aggregation functions are available in software with a minimum version of 3.0.2.0.2. To check the software version, see the article What is the version of my system?.

Note

The implementation of the Group By clause requires that the values of the fields to which we apply it must be stored in the Lucene index. Information about whether the value of a given field is stored in the index can be obtained using the methods of the CaseIndexerFieldsManagerRest service. In response, the IndexField object describing the Lucene index search field contains data informing us whether the field value is stored in the index or not.

The clause allows the use of functions that process and aggregate data, which can be divided into two categories:

  • Transforming functions - functions that result in a new field value in the row.
  • Counting functions - functions that result in the aggregation of data stored in different rows.

Transformation functions​

The transformation functions include implementations of the following described functions.

MATH​

The MATH function returns the result of a mathematical expression, it has the following syntax:

Syntax:
MATH(<expression>)

Parameters (arguments):

  • <expression> - mathematical expression. The mathematical expression is handled by the implementation of the Java class pl.slawas.math.MathExpression (library implemented as part of the sccommon-utils project.
MathExpression

MathExpression - simple handling of arithmetic expressions.

 Grammar:
expression = term | expression `+` term | expression `-` term
term = factor | term `*` factor | term `/` factor
factor = `+` factor | `-` factor | `(` expression `)`
| number | functionName factor | factor `^` factor

Supported mathematical functions and operators. Space separators between expression elements are important except for exponentiation!

  • + addition e.g. 1 + 2, a + b
  • - subtraction e.g. 2 - 1, a - b
  • * multiplication e.g. 1 * 2, a * b
  • / division e.g. 1 / 2, a / b
  • sqrt square root e.g. sqrt(16), sqrt(a)
  • sin sine, base is the number of degrees, e.g., sin(16), sin(a)
  • cos cosine, base is the number of degrees, e.g., cos(16), cos(a)
  • tan tangent, base is the number of degrees, e.g., tan(16), tan(a)
  • ^ exponentiation e.g. 1^2, a^b
Example of using the MATH function:
"groupByClause": "id, MATH(mrc_endDate - mrc_createDate) as howLong",

CONCAT​

The CONCAT function returns the result (string) of concatenating two or more string values. It has the following syntax:

Syntax:
CONCAT( <expression1>, <expression2> [, <expression>]... )

Parameters (arguments):

  • <expression1>, <expression2>, <expression> - string value to concatenate with other values.
Example of using CONCAT function:
"groupByClause": "concat(clientID, \": \", clientName) as acronym"

DECODE​

The DECODE function has the functionality of an IF-THEN-ELSE statement. It has the following syntax:

Syntax:
DECODE( <expression>, <condition> , <result> [, <condition> , <result>]... , <default> )

Parameters (arguments):

  • <expression> - the value to compare. It is automatically converted to the data type of the first search value before the comparison. The expression can also be another conversion function.
  • <condition> - the value to compare with the expression. All search values ​​are automatically converted to the data type of the first search value before the comparison. The condition as a search criterion can also take a data range, e.g. [* TO 125000] - the expression should take values ​​less than or equal to 125000.
  • <result> - the value returned if the expression meets the condition.
  • <default> - if no matches are found, the DECODE function will return the default value. The default value is required.
Example 1: Using the DECODE function where the expression is a field value
"groupByClause": "id, decode(months.costs, \"[* TO 125000]\", \"low\", \"[125001 TO 140000]\", \"medium\", \"high\") as cost, count(*) as count"
Example 2: Using the DECODE function where the expression is a MATH function
"groupByClause": "id, decode(MATH(mrc_endDate - mrc_createDate), \"[* TO -1]\", \"still open\", \"[0 TO 86400000]\", \"1 day\", \"[86400001 TO 172800000]\", \"2 days\", \"very long\") as cost, count(*) as count"

TRUNC​

The TRUNC(date) function is used to retrieve a date with a part truncated to a specified unit of measure. It works according to the Gregorian calendar rules. It has the following syntax:

Syntax:
TRUNC ( date [, format ] )

Parameters (arguments):

  • date - date value to truncate.
  • format - optional, specifying the unit of measure to which the date should be truncated. The format field value can take the following values:
    • YY - with year precision
    • MO - with month precision
    • WE - with week number precision in the year
    • DD - with day precision
    • HD - with half day precision
    • HH - with hour precision
    • MI - with minute precision
    • SS - with second precision
Example of using the TRUNC function:
"groupByClause": "trunc(mrc_createDate, DD) as createDatePerMonth, count(1) as count"

Counting (aggregating) functions​

Counting (aggregating) functions include implementations of the following functions.

SUM​

The SUM function is an aggregate function that returns the sum of all values ​​in a set of values. It has the following syntax:

Syntax:
SUM ( <expression> )

Parameters (arguments):

  • <expression> - the name of the field whose values ​​are to be summed.
Example of using the SUM function:
"groupByClause": "id, SUM(months.costs) as sum"

MIN​

The MIN function is an aggregate function that returns the minimum value of an expression. It has the following syntax:

Syntax:
MIN ( <expression> )

Parameters (arguments):

  • <expression> - the name of the field on the basis of which the minimum value is to be determined.
Example of using the MIN function:
"groupByClause": "id, MIN(months.costs) as min"

MAX​

The MAX function is an aggregate function that returns the maximum value of an expression. It has the following syntax:

Syntax:
MAX( <expression> )

Parameters (arguments):

  • <expression> - the name of the field based on which the maximum value is to be determined.
Example of using the MAX function:
"groupByClause": "id, MAX(months.costs) as max"

COUNT​

The COUNT function is an aggregate function that returns a number expression. It has the following syntax:

Syntax:
COUNT ( <expression> )

Parameters (arguments):

  • <expression> - values ​​to be counted.
Example of using the COUNT function:
"groupByClause": "trunc(mrc_createDate, DD) as createDatePerMonth, COUNT(1) as count"

AVG​

The AVG function is an aggregate function that returns the arithmetic mean of the expression value. It has the following syntax:

Syntax:
AVG ( <expression> )

Parameters (arguments):

  • <expression> - name of the field on which the arithmetic mean is to be calculated.
MAX function usage example:
"groupByClause": "id, AVG(months.costs) as avg"