Manual Distributed Task Execution on GitLab

Using Nx Agents is the easiest way to distribute task execution, but it your organization may not be able to use hosted Nx Agents. You can set up distributed task execution on your own CI provider using the recipe below.

Run Agents on GitLab

Run agents directly on GitLab with the workflow below:

.gitlab-ci.yml
1image: node:18 2 3# Creating template for DTE agents 4.dte-agent: 5 interruptible: true 6 cache: 7 key: 8 files: 9 - yarn.lock 10 paths: 11 - '.yarn-cache/' 12 script: 13 - yarn install --cache-folder .yarn-cache --prefer-offline --frozen-lockfile 14 - yarn nx-cloud start-agent 15 16# Creating template for a job running DTE (orchestrator) 17.base-pipeline: 18 interruptible: true 19 only: 20 - main 21 - merge_requests 22 cache: 23 key: 24 files: 25 - yarn.lock 26 paths: 27 - '.yarn-cache/' 28 before_script: 29 - yarn install --cache-folder .yarn-cache --prefer-offline --frozen-lockfile 30 - NX_HEAD=$CI_COMMIT_SHA 31 - NX_BASE=${CI_MERGE_REQUEST_DIFF_BASE_SHA:-$CI_COMMIT_BEFORE_SHA} 32 33 artifacts: 34 expire_in: 5 days 35 paths: 36 - dist 37 38# Main job running DTE 39nx-dte: 40 stage: affected 41 extends: .base-pipeline 42 script: 43 - yarn nx-cloud start-ci-run --distribute-on="manual" --stop-agents-after=e2e-ci 44 - yarn nx-cloud record -- nx format:check --base=$NX_BASE --head=$NX_HEAD 45 - yarn nx affected --base=$NX_BASE --head=$NX_HEAD -t lint,test,build,e2e-ci --parallel=2 46 47# Create as many agents as you want 48nx-dte-agent1: 49 extends: .dte-agent 50 stage: affected 51nx-dte-agent2: 52 extends: .dte-agent 53 stage: affected 54nx-dte-agent3: 55 extends: .dte-agent 56 stage: affected 57

This configuration is setting up two types of jobs - a main job and three agent jobs.

The main job tells Nx Cloud to use DTE and then runs normal Nx commands as if this were a single pipeline set up. Once the commands are done, it notifies Nx Cloud to stop the agent jobs.

The agent jobs set up the repo and then wait for Nx Cloud to assign them tasks.

Two Types of Parallelization

The agents and the --parallel flag both parallelize tasks, but in different ways. The way this workflow is written, there will be 3 agents running tasks and each agent will try to run 2 tasks at once. If a particular CI run only has 2 tasks, only one agent will be used.

Rerunning jobs with DTE

Rerunning only failed jobs results in agent jobs not running, which causes the CI pipeline to hang and eventually timeout. This is a common pitfall when using a CI providers "rerun failed jobs", or equivalent, feature since agent jobs will always complete successfully.

To enforce rerunning all jobs, you can set up your CI pipeline to exit early with a helpful error. For example:

You reran only failed jobs, but CI requires rerunning all jobs. Rerun all jobs in the pipeline to prevent this error.

At a high level:

  1. Create a job that always succeeds and uploads an artifact on the pipeline with the run attempt number of the pipeline.
  2. The main and agent jobs can read the artifact file when starting and assert they are on the same re-try attempt.
  3. If the reattempt number does not match, then error with a message stating to rerun all jobs. Otherwise, the pipelines are on the same rerun and can proceed as normally.