Diagnosis and Fixing of stuck MR

Diagnosis

When a MR gets merged, the request folder gets added into master. This should be reverted by the DPA bot within a few minutes, after which standard production can occur.

If after ~10 minutes, the request folder still remains in master, this indicates that theres a failure somewhere. Possible causes include:

  • webhook failure: webhook was not received so the bot did not trigger at all

  • Openshift resource overload: Openshift gets quite aggressive about killing jobs that go over the resource. Depending on when this happens, some or even all of the request may not be submitted. This is usually accompanied with the issue not being produced, even if all were successfully submitted.

  • If approved with a convener, then convener might not have a valid proxy to sign.

Fix

A bot feature exists on MC Requests, which can take in several commands (cf /ci-test in Core Software). To fix this particular issue, use the command /submit-failed. The system will then check to see if the database is up to date, check which requests have not been submitted, and advise on the next steps. The bot will reply regarding which action to take next, but they usually come in two forms.

  1. Run /submit-failed --do-for-real

In this case, the system has determined that all the requests have successfully been submitted on DIRAC, the bot now only needs to update the database, clean the repository, and make the issue. Running this command will trigger these steps

  1. Run /submit-failed --do-for-real --allow-submit

In this case, the system has determined that some of the requests weren’t submitted to DIRAC. Carefully check the jobs in DIRAC to see which have been submitted and which have not been submitted based on the information printed in the bot. If you are satified, then run the command to trigger DIRAC submission as well as the usual database update, repository cleaning, and issue making.

Note:

This feature is only available to those in the Expert Group as determined in the Simulation GitLab repository. Anyone else is not authorised and the bot will reply as such.