basic.rst 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233
  1. Basic Concepts
  2. ===============
  3. The two main kinds of objects of the `Task Manager <index.html>`__ are
  4. `configurations <#configurations>`__ and `batches <#batches>`__. Task
  5. Manager also allows the creation of `Templates <#templates>`__ for
  6. configurations.
  7. Configurations
  8. --------------
  9. The configuration is the central object in the Task Manager. A
  10. configuration is typically linked to a data object, such as a GeoServer
  11. layer, and serves as an entry point to the tasks and batches related to
  12. this data object.
  13. A configuration has a unique name, a description and a workspace. It
  14. contains three groups of objects:
  15. - ``Attributes``: The attributes contain information about this
  16. configuration that can be shared between the different tasks of this
  17. configuration. An attribute has a name and a value. Each attribute is
  18. associated with at least one task parameter (see below). Attributes
  19. inherit their validation properties from their associated parameters,
  20. such as its accepted values and whether it is required.
  21. - ``Tasks``: Each task configures an operation that can be executed on
  22. this configuration. Each task has a name that is unique within the
  23. configuration, a type and a list of parameters with each a name and a
  24. value. The full name of a task is donated as
  25. *configuration-name/task-name* (which serves as a unique identifier
  26. for the task). The task's type is chosen from a `list of available
  27. task types <user.html#task-types>`__ which define different kinds of
  28. operations (for example: copy a database table, publish a layer, ..)
  29. and expects a list of parameters that each has a name and a type. A
  30. parameter may or may not be required. The parameter type defines the
  31. accepted values of the parameter. Parameter types are dependent types
  32. when the list of accepted values depends on the value of another
  33. parameter (for example: tables inside a database). A parameter value
  34. is either a literal or a reference to an attribute of the form
  35. ``${attribute-name}``.
  36. - ``Batches``.
  37. Batches
  38. -------
  39. A batch is made of an ordered sequence of tasks that can either be run
  40. on demand or be scheduled to run repeatedly at specific times. There are
  41. two kinds of batches:
  42. - ``Configuration batches``: these are batches that belong to a
  43. configuration. All of the tasks inside this batch are tasks that
  44. belong to that same configuration.
  45. - ``Independent batches``: these are batches that do not belong to a
  46. configuration. They may contains tasks from any existing
  47. configuration.
  48. A batch has a name, a description and a workspace. The name of a batch
  49. must be unique amongst its configuration or amongst all independent
  50. batches. The full name of a batch is denoted as
  51. *[configuration-name:]batch-name* which serves as a unique identifier
  52. for the batch.
  53. Configuration batches that have a name starting with a ``@``, are hidden
  54. from the general batch overview and are only accessible from their
  55. configuration. Hidden batch names may be reserved for special functions.
  56. At this point, there is only one such case (see `Initializing
  57. templates <#templates>`__).
  58. A batch can be run manually if the following conditions are met:
  59. - the list of tasks is non-empty;
  60. - the operating user has the security rights to do so (see
  61. `Security <user.html#security>`__).
  62. A batch will be run automatically on its scheduled time if the following
  63. conditions are met:
  64. - the list of tasks is non-empty;
  65. - the batch is enabled;
  66. - the batch has a frequency configured other than ``NEVER``;
  67. - the batch is independent or its configuration has been completed,
  68. i.e. validated without errors (in some cases a configuration may be
  69. saved before it is validated, see `Initializing
  70. templates <#templates>`__).
  71. Running a batch
  72. ~~~~~~~~~~~~~~~
  73. The batch is executed in two phases:
  74. - ``RUN`` phase: tasks are executed in the defined order. If an error
  75. occurs or the run is manually intermitted, cease execution and go to
  76. ``ROLLBACK`` phase. If all tasks finish successfully, go to ``COMMIT``
  77. phase.
  78. - ``COMMIT/ROLLBACK`` phase: tasks are committed or rollbacked in the
  79. *opposite* order.
  80. Consider a batch with three tasks
  81. *B = T1 -> T2 -> T3*.
  82. A normal run would then be
  83. *run T1 -> run T2 -> run T3 -> commit T3 -> commit T2 -> commit T1*.
  84. However, if T2 fails, the run would be
  85. *run T1 -> run T2 (failure) -> rollback T1*.
  86. Most tasks support ``COMMIT/ROLLBACK`` by creating temporary objects
  87. that only become definite objects after a ``COMMIT``. The ``ROLLBACK``
  88. phase then simply cleans up those temporary objects. However, some
  89. particular `task types <user.html#task-types>`__ may not support the
  90. ``COMMIT/ROLLBACK`` mechanism (in which case running them is definite).
  91. The commit phase happens in opposite order because dependencies in the
  92. old version of the data often requires this. A concrete example may
  93. clear things up. Imagine that *T1* copies a database table *R* from one
  94. database to another, while *T2* creates a view *V* based on that table,
  95. so *V* depends on *R*. If the table and view already exist in older
  96. versions (*R\_old* and *V\_old*), they must not be removed until the
  97. ``COMMIT`` phase, so that their original state remains in the case of a
  98. ``ROLLBACK``. During the ``COMMIT`` phase, *R\_old* and *V\_old* are
  99. removed, but it is not possible to remove *R\_old* until *V\_old* is
  100. removed. Therefore it is necessary to commit *T2* before *T1*.
  101. The ``COMMIT`` phase typically replaces old objects with the new objects
  102. that have a temporary name. Since tasks often create objects that depend
  103. on objects of the previous tasks, these objects contain references to
  104. temporary names. Which means that when the temporary object is committed
  105. and becomes the real object, references in depending objects must also
  106. be updated. For this purpose, a tasks that uses a temporary object from
  107. a previous task registers a *dependency*, which is essentially an update
  108. added to the commit phase of that previous task.
  109. If *T3* has a dependency on task *T1* that we call *D1*, the following
  110. happens:
  111. *run T1 -> run T2 -> run T3, register D1 -> commit T3 -> commit T2 ->
  112. commit T1, update D1*.
  113. Let's make it clearer again using an example. During the ``RUN`` phase
  114. *T1* creates table *R1\_temp* and *T2* creates *V1\_temp* that depends
  115. on *R1\_temp*, this dependency will be registered. During the commit
  116. phase, *T2* will replace *V1* by *V1\_temp*. Then, *T1* will replace
  117. *T1* by *T1\_temp*. However, *V1* may still reference *T1\_temp* which
  118. no longer exists. Therefore, *T1* will use the registered dependency to
  119. update *V1* to refer to *T1* instead of *T1\_temp*.
  120. Within a batch run, each task that has yet started has a status. These
  121. are the possible statuses:
  122. - ``RUNNING``: the task is currently running.
  123. - ``WAITING_TO_COMMIT``: the task has finished running, but is waiting
  124. to commit (or rollback) while other tasks are running or committing
  125. (or rolling back).
  126. - ``COMMITTING``: the task is currently committing.
  127. - ``ROLLING_BACK``: the task is currently rolling back.
  128. - ``COMMITTED``: the task was successfully committed.
  129. - ``ROLLED_BACK``: the task was successfully rolled back.
  130. - ``NOT_COMMITTED``: the task was supposed to commit but failed during
  131. the commit phase.
  132. - ``NOT_ROLLED_BACK``: the task was supposed to roll back but failed
  133. during roll back phase.
  134. A task is consired finished if its status is not ``RUNNING``,
  135. ``WAITING_TO_COMMIT``, ``ROLLING_BACK`` or ``COMMITTING``. A batch run
  136. does not have its own status, but it takes on the status of the last
  137. task that has started but is not ``COMMITTED`` or ``ROLLED_BACK``. A
  138. batch run is considered finished if its status is not ``RUNNING``,
  139. ``WAITING_TO_COMMIT`` or ``COMMITTING``.
  140. There is concurrency protection both on the level of tasks and batches.
  141. A single batch can never run simultaneously in multiple runs (the second
  142. run will wait for the first one to finish). A single task can never run
  143. simultaneously in multiple runs, even if part of a different batch. A
  144. single task can also not commit simultaneously in multiple runs.
  145. Templates
  146. ---------
  147. Templates are in every way identical to configurations, with the
  148. exception of:
  149. - they are never validated when saved (their attributes need not be
  150. filled in) and
  151. - their tasks and batches can never be executed.
  152. A template is used as a blueprint for the creation of configurations
  153. that are very similar to each other. Typically, the tasks are all the
  154. same but the attribute values are different. However, a template may
  155. also have attribute values filled in that serve as defaults.
  156. Once a configuration is created from a template, it is independent from
  157. that template (changes to the template do not affect it). The
  158. configuration can then be modified like any other configuration,
  159. including the removal, addition and manipulation of tasks.
  160. Initializing templates
  161. ~~~~~~~~~~~~~~~~~~~~~~
  162. An initializing template is any template that has a batch named
  163. ``@Initialize`` (case sensitive), which configures special behaviour.
  164. The purpose of this batch is to execute some tasks that must have been
  165. done at least once until some other tasks can actually be configured.
  166. For example, you may want to create a vector layer based on that table
  167. copied from a source database, then synchronise this layer to a target
  168. geoserver. The task that synchronizes a layer to the external geoserver
  169. will expect an existing configured layer, which you cannot create until
  170. you have copied the table first. The ``@Initialize`` batch would in this
  171. case copy the table from the source and create a layer in the local
  172. geoserver.
  173. When creating a configuration from this template, configuration happens
  174. in two phases
  175. -
  176. (1) Initially, only attributes related to tasks in the
  177. ``@Initialize`` batch must be configured. When the configuration
  178. is saved, the ``@Initialize`` batch is automatically executed.
  179. -
  180. (2) Now, all other attributes and tasks must be configured and the
  181. configuration must be saved again.
  182. This is the only case that a configuration can be saved before all the
  183. required attributes are filled in. Mind that batches will not be
  184. scheduled or visible in the general overview until the batch has been
  185. saved again (and the attributes have thus been validated).