Skip to content

Plan for the integration of CWL in WMS beyond DPPS rel 0.1

Plan for the integration of CWL in WMS beyond DPPS rel 0.1

Links

Reminder of the job submission/execution flow in DIRAC (v8)

On the client:

  1. The user submits a job described in JDL using the DIRAC Job API (or CLI)
  2. The JDL is sent to the Job Manager

On the server:

  1. The Job Manager inserts the Job in the JobDB (JDL, job parameters, etc.)
  2. The Job Manager calls the Executors passing the JDL
  3. The Executors perform scheduling decisions and insert the Job in the TaskQueueDB (JobID)
  4. The Site Director checks the TaskQueueDB and submits pilots

On the worker node:

  1. Pilot gets executed and contacts the Matcher to pull the Job to be executed
  2. The Job executable is retrieved from JDL and corresponds to the user application

dirac-cwl prototype

https://github.com/aldbr/dirac-cwl-proto

  • It follows the transition plan JDL->CWL https://github.com/DIRACGrid/diracx/discussions/175

  • It is a standalone demonstrator which implements a User Interface to submit Jobs, Transformations and Productions described in CWL

  • At term, it will be integrated in DiracX

dirac-cwl prototype: Job submission

The CWL task is executed within a single DIRAC Job using cwltool. The CWL task can be a : CommandLine or a Workflow.

  • It introduces a new Job Model, whose key elements are:

    • CWL Task: a standard, unmodified CWL definition that can be directly used or tested locally
    • Job Parameters : CWL inputs
    • Job Description: specific DIRAC metadata (computing sites, priority, output specifications, ...)
  • The user specifies:

    • CWL task
    • CWL inputs
    • DIRAC parameters (site, priority): Dirac-specific attributes related to scheduling
  • From these inputs, it builds a JSON object that corresponds to the new Job Model that will be sent to the Job Manager

  • It simulates the Job Manager and the Job Wrapper just executing jobs locally

dirac-cwl prototype: Production submission (i.e. multi-stage workflows)

The CWL task is executed as a DIRAC Production. The CWL task must be a Workflow. Each Step of the Worflow can be a CommandLine or a Workflow.

  • It creates a Production which corresponds to the whole Workflow

  • Each Step of the Workflow corresponds to a Transformation which belongs to the Production

  • The user specifies:

    • CWL task (inputs already described within it)
    • Step Metadata (per step):
      • Dirac description (site, priority)
      • Metadata (job type, group size, query parameters)
  • It builds a JSON object corresponding to the Productions Model (CWL task, steps metadata)

  • It simulates the ProductionManager and the TransformationManager. It also simulates the TS Agents which create the jobs to be submitted to the fake JobManager

dirac-cwl prototype: Usage examples

The same CWL task can be submitted as a Job, a Transformation or a Production.

The user should specify inputs and metadata differently according to the submission mode.

For instance, if he submits it as a Job, he will specify the inputs of the CWL task and eventual metadata for job scheduling (site, priority, ...).

If he submits it as a Production, he will specify steps metadata which will contain the queries to the File Catalog to select input data, transformation group size etc.

dirac-cwl job submit <workflow_path> [--parameter-path <input_path>] [--metadata-path <metadata_path>]

dirac-cwl transformation submit <workflow_path> [--metadata-path <metadata_path>]

dirac-cwl production submit <workflow_path> [--steps-metadata-path <steps_metadata_path>]

dirac-cwl prototype: Implement DIRAC metadata via hints for CWL

At the DiracX hackathon in Jan 2025, Mykhailo proposed to implement DIRAC metadata as CWL hints so that there would be no need to handle them separately. See:

https://github.com/aldbr/dirac-cwl-proto/pull/11

Job submission/execution flow with CWL in DiracX

On the client:

  1. The user submits a job using an interface derived from the dirac-cwl prototype, i.e. passing CWL workflow, CWL inputs and DIRAC specific parameters. The interface builds a JSON object according to the new Job Model
  2. The JSON object is sent to the Job Manager

On the server:

  1. The Job Manager inserts the Job parameters in the JobDB and it inserts the CWL workflow and CWL inputs in a dedicated Workflow table, retrieving the WorkflowID InputID. This avoids the need to add the CWL workflow and CWL inputs in the ISB.

  2. The Job Manager calls the Executors passing the JSON object

  3. The Executors perform scheduling decisions and insert the Job in the TaskQueueDB (JobID)

  4. The Site Director checks the TaskQueueDB and submits pilots

On the worker node:

  1. Pilot gets executed and contacts the Matcher to pull the Job to be executed
  2. The Job executable is cwltool. The CWL workflow and CWL inputs are retrieved from the Workflow table.

Transition (JDL->CWL) plan for DiracX

In DIRAC v9, there is already a DiracX JobManager but it handles only JDL jobs. To implement the JDL->CWL transition plan in DiracX we need to:

  • Modify the DiracX JobManager to make it compliant with the new JSON Job Model
  • Implement DiracX Executors compliant with the new JSON Job Model
  • Implement DiracX Matcher compliant with the new JSON Job Model
  • Add the Workflow Task Table

Intermediate transition plan (JDL->CWL)

It can be implemented in DIRAC v9.

  • Extract from the diracx-cwl prototype the parts that concern the User Interface, i.e. generation of the JSON object from the inputs given by the user (CWL workflow, CWL inputs, DIRAC parameters)

  • Modify the DiracX JobManager to make it compliant with the new Job Model

  • Implement a JSON->JDL converter

  • Keep using the legacy Executors and Matcher (dealing with JDL)

  • The JSON->JDL converter will generate a JDL.

    • Executable = cwltool job.cwl inputs.cwl where:

      • job.cwl is the CWL workflow extracted from the Workflow Table
      • inputs.cwl are the CWL inputs extracted from the Workflow Table
    • Site = extracted from the Job Description

    • Other DIRAC metadata = ...

The submission/execution flow would be:

On the client:

(1->2 idem as in DiracX)

  1. The user submits a job using an interface derived from the dirac-cwl prototype
  2. The JSON object is sent to the Job Manager

On the server:

(3 idem as in DiracX)

  1. The Job Manager inserts the Job parameters in the JobDB and it inserts the CWL workflow and CWL inputs in a dedicated Workflow table, retrieving the WorkflowID InputID. This avoids the need to add the CWL workflow and CWL inputs in the ISB.

  2. The JobManager calls the JSON->JDL converter passing it the JSON and WorkflowID InputID

Note that the JDL should have 2 additional parameters: WorkflowID InputID.

(5->8 idem as in DIRAC v8):

  1. The Job Manager inserts the Job in the JobDB (JDL, job parameters, etc.)
  2. The Job Manager calls the Executors passing the JDL
  3. The Executors perform scheduling decisions and insert the Job in the TaskQueue DB (JobID)
  4. The Site Director checks the TaskQueueDB and submits pilots

On the worker node:

  1. Pilot gets executed and contacts the Matcher to pull the Job to be executed
  2. The Job executable is cwltool.The CWL workflow and CWL inputs, corresponding to the WorkflowID InputID specified in the JDL are retrieved from the Workflow table.

Very preliminary transition plan (JDL->CWL)

It can be implemented in DIRAC v8.

  • Extract from the diracx-cwl prototype the parts that concern the User Interface, i.e. generation of the JSON object from the inputs given by the user (CWL workflow, CWL inputs, DIRAC parameters)
  • Implement a JSON->JDL converter that will be executed on the client
  • Keep using the legacy JobManager, Executors and Matcher

The implemented JSON->JDL converter could be the same that is used in the Intermediate transition plan

Plans for DPPS releases

Rel 0.1

It is based on DIRAC v8. The use case is to submit a single-job workflow described in CWL.

We could have used the dirac-cwl prototype extracting from it the parts that concern the generation of the JSON object starting from user inputs (cwl workflow, cwl inputs, DIRAC parameters).

Since in v8 (but also in v9) the JobManager only deals with JDL, we should had to convert the generated JSON object into JDL and submit it to DIRAC as usual.

Given the simplicity of the use case for rel 0.1 instead of doing so, we have written our own interface which works similarly to the dirac-cwl prototype (even if much more simplistic) but which instead of generating a JSON object, it generates directly a JDL that can be submitted to DIRAC.

Beyond Rel 0.1

WMS will be based on DIRAC v9.

The goal is to reuse parts of the dirac-cwl prototype for the User Interface to submit jobs, transformations and productions described in CWL.

Before starting any development work the User Interface in dirac-cwl prototype should be stable. In particular, for the use of hints. See PR:

https://github.com/aldbr/dirac-cwl-proto/pull/11

Job submission

Since the framework to implement Executors in DiracX is not yet ready, we cannot start yet the final plan:

Transition (JDL-CWL) plan for Diracx

but we could start with the:

or the:

Transformation/Production submission

To handle transformations and productions as sketched in the dirac-cwl prototype, it requires substantial development work :

  • Implement the whole Transformation System in DiracX
  • Implement the Production System in DiracX compliant with the new Production Model

Since the framework to implement Agents in DiracX (Tasks in DiracX jargon) is not yet ready, we cannot start the development of the Transformation System.

See also the DiracX roadmap:

https://github.com/DIRACGrid/diracx/blob/main/docs/ROADMAP.MD

We could have an intermediate transition plan:

  • Implement a TransformationManager in DiracX compliant with the new Transformation Model
  • Keep the legacy TS Agents
  • Implement a ProductionManager in DiracX compliant with the new Production Model

or still more preliminary:

  • Implement a converter that takes the JSON description of a Transformation generated by the dirac-cwl prototype and converts it in a Transformation object of legacy DIRAC
  • Implement a converter that takes the JSON description of a Production generated by the dirac-cwl prototype and converts it in a Production object of legacy DIRAC

These converters would run on the client prior to submission.