Subsections


Query Expressions

The select operator is used to perform queries in a database. The general syntax of this operator is as follows:

select [distinct] projection [from fromList [where predicat] [order by orderExprList [asc|desc]]
  1. distinct means that duplicates must be eliminated,
  2. projection is an expression using the variables defined in the fromList,
  3. fromList is a sequence of comma-separated items under one of the following forms:
    var in expr
    expr as var
    expr var
    where expr is an expression of type collection or is a class name, and var is the name of a variable.
  4. predicat is a boolean expression using the variables defined in the fromList,
  5. orderExprList is a comma-separated list of sortable expressions (i.e. atomic type).
  6. asc means that the order should be performed in an ascendant way (the default) and desc means the inverse.

ODMG vs. EYEDB OQL Query Expressions

As explained in the Section OQL vs. ODMG 3 OQL, there are a few differences between ODMG OQL and EYEDB OQL query expressions:

The general select syntax

Rather than introducing the select operator in a formal way by using a lot mathematical symbols, we introduce it first, in an unformal way, and then, through query examples.

The unformal description of the query process is as follows:
  1. The from clause determine the sets of objects on which the query will be applied. For instance, in the from clause, ``x in Person, y in x.children'', the sets on which the query will be applied are all the Person instances bound to the variable x and the children of these instances, bound to the variable y.
  2. These sets of objects are filtered by retaining only the objects that satisfy the predicat in the where clause. These result objects are gathered into a bag. If no where clause is there, all objects are retained.
  3. If an order by clause is present, a sort is performed using to the following process:
    1. each order expression must be of a sortable type: number (int, char or float) or string. If not, an error is raised.
    2. the bag is ordered into a list according to the first order expression. Then identical atoms are ordered again using the second order expression and so on.
    3. there is a restriction in the current implementation: each expression in the orderExprList must be present in the projection expression. If not present, an error is raised.
  4. The projection expression is evaluated for each object in the collection and the results of these evaluations are gathered into a bag.
  5. If the keyword distinct is there, the eventual duplicates are eliminated.
  6. Finally, if the order by clause is present, the result bag is converted to a list; if the distinct keyword is there without an order by, the bag is converted to a set; and if neither order by nor distinct are used, we get a bag.
The following table presents several examples:
Simple select/from Examples
select x from Person x
returns a bag containing all the Person oids in the database
select x.name from Person x
returns a bag containing the name of every Person instance in the database
select struct(name: x.name, age: x.age) from Person x
returns a bag containing struct elements including the name and age of every Person instance in the database
select list(x, x.name) from Person x where x.name ~ "h"
returns a bag of list elements containing the oid and the name of every Person instances whose name matches the regular expression "h"
select list(x, x.name) from Person x where x.name ~ "h" order by x.name
same as previous example, but the result is a list ordered by the name of the persons
select x from Person x where x.spouse.name = "john" or x.age < 10
returns a bag of Person instances whose spouse name is equal to "john" or the age is less than 10
select x from Person x order by x.name
current implementation restriction: raises an error: x.name not found in projection


Arrays and collections in query expressions

OQL provide supports for performing direct queries through non-collection array and collection attributes without explicit joins. To perform such queries, one must use the operators [], [?] or [:]. All the operators may be used for non-collection arrays. For collections (list, set, bag and array), only the operator [?] is valid.
The operator [] denotes an element in a non-collection array.
The operator [:] denotes a range of elements in a non-collection array.
The operator [?] denotes all elements in a non-collection array or in a collection. The following table presents several examples:
Array and Collection based Query Examples
select x.name from Person x where x.name[0] = 'j'
returns all the Person instances whose name begins with a 'j'
select x from Person x where x.other_addrs[0].street ~~ "par."
returns all the Person instances whose first other_addrs street matches the regular expression "par."
select x from Person x where x.other_addrs[?].street ~~ "par.."
returns all the Person instances whose any other_addrs street matches the regular expression "par.."
select x from Person x where x.other_addrs[1:3].street ~~ "par.."
returns all the Person instances whose the first, second or third other_addrs street matches the regular expression "par.."
select x from Person x where x.children[?].name = "johnny"
returns all the persons whose one of its children is called "johnny"
select x from Person x where x.cars[?].num < 100 or x.children[?].name = "mary"
returns all the persons whose one of its cars has a number less than 100 or a child called "mary"
select x from Person x where x.children[1].name = "johnny"
although the children is a collection array, an error is raised. This is a current limitation of the implementation that will disapear soon

The Implicit select syntax

An implicit select expression is a select expression with neither a from nor an explicit where clause. In fact, the where clause may be included in the projection expression.
This particular syntax has the advantage to be more compact and more simple than the general syntax, but some queries cannot be performed:
  1. queries performing an explicit join,
  2. queries having or or and in their where clause.
The following table presents several examples:
Simple Implicit select Examples
select 1
returns 1
select Person
returns a bag containing all Person instances in the database
select Person.name
returns a bag containing the name of every Person instances in the database
select Person.name = "john"
returns a bag containing the oids of every Person instances whose name is equal to "john"
(select distinct Person.name = "john").age
returns a set containing the age of every Person instances whose name is equal to "john"
select Person.name = "john" or Person.age = 10
raises an error: use select/from/where clause
select Person.name order by Person.name
returns a list containing of the sorted names of all Person instances


Querying the schema

As every abstraction is an object, one can perform queries on the schema and classes. For instance, to get the oids of all the existing classes in a schema:
select schema which is equivalent to select class.
In the EYEDB object model, the class class has the following native attributes (some are inherited from the object class):
  1. class (inherited from object) is the class of this class,
  2. protection (inherited from object) is the protection object of this class,
  3. type is the type of the class: "system" or "user".
  4. name is the name of this class,
  5. parent is the parent class of this class,
  6. extent is the extent collection of this class: contains all the object instances of this class,
  7. components is the collection of components of this class: contains the constraints, the methods, the triggers, the index.
Queries can be performed according to one or more of the previous attributes.
Query Schema Examples
select class.name = "Person"
returns a bag containing the class whose name is "Person"
select class.type = "user"
returns all the user classes
select x from class x where x.name ~ "P" and x.type = "user"
returns the user classes whose name matches the given regular expression
select class.parent.name = "Person"
returns a bag containing the sub-classes of the class Person


How queries are optimized?

EYEDB queries implementation make an heavy use of index. Index may be used as terminal index or as non-terminal index: terminal index are those used at the end of a path expression, and the other are those used inside a path expression.

For instance, assuming that each attribute in our schema is indexed, the query select Person.name = "john" uses the index on the attribute name as a terminal index.

The query select Person.spouse.age < 23 uses the index on the attribute age as a terminal index, and the index on the indirect attribute spouse as a non-terminal index, while the query select Person.spouse = 6252.3.48474:oid uses the index on the spouse attribute as a terminal index.

The query select Person.children[?].spouse.name = "mary" uses the index of the attributes name, spouse and of the literal collection attribute children.

The queries with a where clause containing logical and constructs are optimized so to take advantage of the index. For instance, assuming that there is an index on the name attribute and no index on the age attribute, the query select x from Person x where x.age = 100 and x.name = "john" will perform first the query on the name attribute and then filter the result according to the given predicat on the age attribute.
If the two expressions around the logical and operator have an index or if none of the two expression has a index, the interpreter performs first the left part and then the second part. So, in this case, the order of the two expressions around the and is important.

Finally, the query select Person.name reads directly the index of the name attribute, instead of reading the name of each Person instance.

Nevertheless, currently, there are some constructs that the OQL interpreter does not interpret cleverly and where index are not used. This is unfortanely the case of the join constructs such as:
select x from Person x, x.spouse y where y.name = "mary".
This construct will not make use of the index on the spouse attribute (but it will use the index of the name attribute). Fortunately, in a lot of cases, join queries may be replaced by a path expression query, for instance:
select x from Person x where x.spouse.name = "mary" is the alternate form the previous join query.

This alternate form (path-expression oriented) is the preferred one, because:
- it is more intuitive,
- it is more object oriented (please forget relationnal!),
- it is more compact,
- it uses index properly.

Of course, the last reason is not a good reason as a proper query implementation should use index whatever the used syntax. The implementation will be improved in a next version.

Note that query optimisations are fragile and do not believe that the OQL interpreter knows your data better than you. So, the two following simple rules should be applied:
  1. when path expression oriented queries are enough, do not use join constructs,
  2. in a and expression within a where clause, put the expression which denotes the most little set of instances on the left side of the and. In some particular cases, the OQL interpreter knows how to optimize the and, but in most of the cases, it does not.
Some experimental hints can be added to an and expression in the where clause to help the interpreter to do the things better, but there are currentlty too much experimental to be documented here.

EyeDB manual