123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233 |
- Basic Concepts
- ===============
- The two main kinds of objects of the `Task Manager <index.html>`__ are
- `configurations <#configurations>`__ and `batches <#batches>`__. Task
- Manager also allows the creation of `Templates <#templates>`__ for
- configurations.
- Configurations
- --------------
- The configuration is the central object in the Task Manager. A
- configuration is typically linked to a data object, such as a GeoServer
- layer, and serves as an entry point to the tasks and batches related to
- this data object.
- A configuration has a unique name, a description and a workspace. It
- contains three groups of objects:
- - ``Attributes``: The attributes contain information about this
- configuration that can be shared between the different tasks of this
- configuration. An attribute has a name and a value. Each attribute is
- associated with at least one task parameter (see below). Attributes
- inherit their validation properties from their associated parameters,
- such as its accepted values and whether it is required.
- - ``Tasks``: Each task configures an operation that can be executed on
- this configuration. Each task has a name that is unique within the
- configuration, a type and a list of parameters with each a name and a
- value. The full name of a task is donated as
- *configuration-name/task-name* (which serves as a unique identifier
- for the task). The task's type is chosen from a `list of available
- task types <user.html#task-types>`__ which define different kinds of
- operations (for example: copy a database table, publish a layer, ..)
- and expects a list of parameters that each has a name and a type. A
- parameter may or may not be required. The parameter type defines the
- accepted values of the parameter. Parameter types are dependent types
- when the list of accepted values depends on the value of another
- parameter (for example: tables inside a database). A parameter value
- is either a literal or a reference to an attribute of the form
- ``${attribute-name}``.
- - ``Batches``.
- Batches
- -------
- A batch is made of an ordered sequence of tasks that can either be run
- on demand or be scheduled to run repeatedly at specific times. There are
- two kinds of batches:
- - ``Configuration batches``: these are batches that belong to a
- configuration. All of the tasks inside this batch are tasks that
- belong to that same configuration.
- - ``Independent batches``: these are batches that do not belong to a
- configuration. They may contains tasks from any existing
- configuration.
- A batch has a name, a description and a workspace. The name of a batch
- must be unique amongst its configuration or amongst all independent
- batches. The full name of a batch is denoted as
- *[configuration-name:]batch-name* which serves as a unique identifier
- for the batch.
- Configuration batches that have a name starting with a ``@``, are hidden
- from the general batch overview and are only accessible from their
- configuration. Hidden batch names may be reserved for special functions.
- At this point, there is only one such case (see `Initializing
- templates <#templates>`__).
- A batch can be run manually if the following conditions are met:
- - the list of tasks is non-empty;
- - the operating user has the security rights to do so (see
- `Security <user.html#security>`__).
- A batch will be run automatically on its scheduled time if the following
- conditions are met:
- - the list of tasks is non-empty;
- - the batch is enabled;
- - the batch has a frequency configured other than ``NEVER``;
- - the batch is independent or its configuration has been completed,
- i.e. validated without errors (in some cases a configuration may be
- saved before it is validated, see `Initializing
- templates <#templates>`__).
- Running a batch
- ~~~~~~~~~~~~~~~
- The batch is executed in two phases:
- - ``RUN`` phase: tasks are executed in the defined order. If an error
- occurs or the run is manually intermitted, cease execution and go to
- ``ROLLBACK`` phase. If all tasks finish successfully, go to ``COMMIT``
- phase.
- - ``COMMIT/ROLLBACK`` phase: tasks are committed or rollbacked in the
- *opposite* order.
- Consider a batch with three tasks
- *B = T1 -> T2 -> T3*.
- A normal run would then be
- *run T1 -> run T2 -> run T3 -> commit T3 -> commit T2 -> commit T1*.
- However, if T2 fails, the run would be
- *run T1 -> run T2 (failure) -> rollback T1*.
- Most tasks support ``COMMIT/ROLLBACK`` by creating temporary objects
- that only become definite objects after a ``COMMIT``. The ``ROLLBACK``
- phase then simply cleans up those temporary objects. However, some
- particular `task types <user.html#task-types>`__ may not support the
- ``COMMIT/ROLLBACK`` mechanism (in which case running them is definite).
- The commit phase happens in opposite order because dependencies in the
- old version of the data often requires this. A concrete example may
- clear things up. Imagine that *T1* copies a database table *R* from one
- database to another, while *T2* creates a view *V* based on that table,
- so *V* depends on *R*. If the table and view already exist in older
- versions (*R\_old* and *V\_old*), they must not be removed until the
- ``COMMIT`` phase, so that their original state remains in the case of a
- ``ROLLBACK``. During the ``COMMIT`` phase, *R\_old* and *V\_old* are
- removed, but it is not possible to remove *R\_old* until *V\_old* is
- removed. Therefore it is necessary to commit *T2* before *T1*.
- The ``COMMIT`` phase typically replaces old objects with the new objects
- that have a temporary name. Since tasks often create objects that depend
- on objects of the previous tasks, these objects contain references to
- temporary names. Which means that when the temporary object is committed
- and becomes the real object, references in depending objects must also
- be updated. For this purpose, a tasks that uses a temporary object from
- a previous task registers a *dependency*, which is essentially an update
- added to the commit phase of that previous task.
- If *T3* has a dependency on task *T1* that we call *D1*, the following
- happens:
- *run T1 -> run T2 -> run T3, register D1 -> commit T3 -> commit T2 ->
- commit T1, update D1*.
- Let's make it clearer again using an example. During the ``RUN`` phase
- *T1* creates table *R1\_temp* and *T2* creates *V1\_temp* that depends
- on *R1\_temp*, this dependency will be registered. During the commit
- phase, *T2* will replace *V1* by *V1\_temp*. Then, *T1* will replace
- *T1* by *T1\_temp*. However, *V1* may still reference *T1\_temp* which
- no longer exists. Therefore, *T1* will use the registered dependency to
- update *V1* to refer to *T1* instead of *T1\_temp*.
- Within a batch run, each task that has yet started has a status. These
- are the possible statuses:
- - ``RUNNING``: the task is currently running.
- - ``WAITING_TO_COMMIT``: the task has finished running, but is waiting
- to commit (or rollback) while other tasks are running or committing
- (or rolling back).
- - ``COMMITTING``: the task is currently committing.
- - ``ROLLING_BACK``: the task is currently rolling back.
- - ``COMMITTED``: the task was successfully committed.
- - ``ROLLED_BACK``: the task was successfully rolled back.
- - ``NOT_COMMITTED``: the task was supposed to commit but failed during
- the commit phase.
- - ``NOT_ROLLED_BACK``: the task was supposed to roll back but failed
- during roll back phase.
- A task is consired finished if its status is not ``RUNNING``,
- ``WAITING_TO_COMMIT``, ``ROLLING_BACK`` or ``COMMITTING``. A batch run
- does not have its own status, but it takes on the status of the last
- task that has started but is not ``COMMITTED`` or ``ROLLED_BACK``. A
- batch run is considered finished if its status is not ``RUNNING``,
- ``WAITING_TO_COMMIT`` or ``COMMITTING``.
- There is concurrency protection both on the level of tasks and batches.
- A single batch can never run simultaneously in multiple runs (the second
- run will wait for the first one to finish). A single task can never run
- simultaneously in multiple runs, even if part of a different batch. A
- single task can also not commit simultaneously in multiple runs.
- Templates
- ---------
- Templates are in every way identical to configurations, with the
- exception of:
- - they are never validated when saved (their attributes need not be
- filled in) and
- - their tasks and batches can never be executed.
- A template is used as a blueprint for the creation of configurations
- that are very similar to each other. Typically, the tasks are all the
- same but the attribute values are different. However, a template may
- also have attribute values filled in that serve as defaults.
- Once a configuration is created from a template, it is independent from
- that template (changes to the template do not affect it). The
- configuration can then be modified like any other configuration,
- including the removal, addition and manipulation of tasks.
- Initializing templates
- ~~~~~~~~~~~~~~~~~~~~~~
- An initializing template is any template that has a batch named
- ``@Initialize`` (case sensitive), which configures special behaviour.
- The purpose of this batch is to execute some tasks that must have been
- done at least once until some other tasks can actually be configured.
- For example, you may want to create a vector layer based on that table
- copied from a source database, then synchronise this layer to a target
- geoserver. The task that synchronizes a layer to the external geoserver
- will expect an existing configured layer, which you cannot create until
- you have copied the table first. The ``@Initialize`` batch would in this
- case copy the table from the source and create a layer in the local
- geoserver.
- When creating a configuration from this template, configuration happens
- in two phases
- -
- (1) Initially, only attributes related to tasks in the
- ``@Initialize`` batch must be configured. When the configuration
- is saved, the ``@Initialize`` batch is automatically executed.
- -
- (2) Now, all other attributes and tasks must be configured and the
- configuration must be saved again.
- This is the only case that a configuration can be saved before all the
- required attributes are filled in. Mind that batches will not be
- scheduled or visible in the general overview until the batch has been
- saved again (and the attributes have thus been validated).
|