This document describes the syntax used to define workflows in BountyHub. Workflows are defined using YAML syntax, which is a human-readable data serialization format.
We will go through each field in the workflow definition, describing its purpose, type, and examples of how to use it.
scansEach workflow is made of one or more scans. Each scan has a unique name, which
is a key in the scans object. The name will be presented as [ID] for the rest
of the document, since it is unique for each scan, and therefore, serves as an
identifier.
You can have the unlimited number of scans, as long as their names are unique.
The names must only contain alphanumeric characters and the _ character. This
constraint is in place to ensure that names are compatible with expressions and
other parts of the system. Names cannot contain spaces or special characters,
such as -, ., etc.
| name | valid |
|---|---|
| example | true |
| test_1 | true |
| test-one | false |
scans.[ID].onOn field serves as a trigger for a scan execution. This object contains specification that will trigger a scan on some event.
To start the pipeline, there must exist some event to trigger it. Events are
raised automatically (on cron), or manually (on dispatch). From there, you
build the next stage by running on expr.
To put things into perspective, something needs to trigger the scan. After the
scan is done, each expression scan is evaluated. If the expression evaluates to
true, the scan is scheduled. This means that a single scan can one or more
scans.
During evaluation, the scan checks against its latest job time. More about
expressions used to trigger expr scans can be found
here.
scans.[ID].on.cronCron defines a schedule based on which the workflows are executed.
Cron is described in form of:
| minute | hour | day of month | month | day of week |
|---|---|---|---|---|
| required | required | required | required | required |
Times are based on UTC timezone, so please take that into account when writing your schedules.
If you need help specifying or testing your cron schedule, you can use crontab guru. This is a great site that will help you write and understand the spec you write for the scheduler.
As a limitation, cron parser does not allow non-standard modifiers, such as
@hourly etc. Currently, supported specifications include:
*),)-)/)Since the platform has its own parser, adding non-standard modifiers can be considered in the future, based on your feedback.
Example (every day at 15:00/3PM UTC):
scans.[ID].on.exprThis field allows you to specify expression that describes when should the scan be scheduled.
You can learn more about expressions by visiting the Workflow Expressions page
Expressions are to describe the scan that should run only if expression
evaluates to true. After each successful scan, server goes through
expression scans and evaluates if the scan should be scheduled.
Let's build an example workflow to show how it can be used.
Goals:
subfinder and assetfinder.httprobe can execute once the subdomain discovery stage is over and
there is a diff.We can combine the power of needs field, and use expr to further describe
when our probe should be executed:
Let's summarize why and how this works. Periodically, we want to check if new domains are discovered. In order to do that, we run two subdomain discovery tools:
The purpose of those scans is to find as many subdomains as we possibly can. But since these are just two tools to accomplish something, we create a "meta scan" that only combines these results. At the end of the day, we care about subdomains, not about the single tool's result. This can be done in a single scan to preserve the space, but at the expense of one tool failure resulting in the whole scan failure. Use your judgment to decide what is best for you.
To schedule a subdomaindiscovery scan, we need to have assetfinder or
subfinder done and ready to be used. To accomplish that, we can leverage the
expression engine, particularly, is_available() function. You can read more
about this function on the
Workflow Expressions page, but in
summary, is_available() checks if there is at least one job that executed
successfully after the last execution of the current scan.
But since the subdomaindiscovery can be triggered when only one of these scans
is done, we need to conditionally download the results. This is done using the
if field of the step. If the scan is not available, the step is skipped.
Next, for the sake of example, we want to run httpx on newly found subdomains.
Since we only care about the latest result, we would download the latest version
of the subdomaindiscovery scan, and run httpx on it.
scans.[ID].on.dispatchDispatch trigger is the type of trigger that allows only manual invocations.
Examples where dispatch might be useful include:
dispatch.inputs to pass the asset information.For example:
scans.[ID].on.dispatch.inputsInputs are arguments, added to the evaluation context, that exist only during the execution of the scan. They allow you to parametrize the scan, so if you want to invoke a tool on-demand, and you need to pass some parameters to it, you can use inputs.
There are two primary reasons to use dispatch:
dispatch scan and
parametrize it with inputs to invoke the tool with required data.scans.[ID].on.dispatch.inputs.[KEY]Key is a unique identifier for an input. It conforms to constraints related to
project variables (i.e. alphanumeric or _).
The key is used by the expression engine to reference the input value, so if you
have an input with key domain, you can reference it using
${{ inputs.domain }}.
There are few constraints you should be aware of:
scans.[ID].on.dispatch.inputs.[KEY].typeInput type can either be a string or a bool.
By default, if you don't specify the type, it is assumed to be a "string".
The default value for the type string is an empty string (""). For boolean
type, the default value is false.
The type of the field is a string, so valid way to specify it is either
type: string or type: "string".
scans.[ID].on.dispatch.inputs.[KEY].requiredRequired field specifies whether the input is required or not. The field does not assume any default value if not specified.
Keep in mind, empty string is still a valid input. Once dispatched, every value is submitted to the server, and if the input is required, it must be present.
This limitation is mostly implemented to ensure that the calls to
bh job dispatch uses correct parameters.
scans.[ID].on.dispatch.inputs.[KEY].defaultSets the default value for the input if the input is not specified.
The type of this field in YAML is "string", so only valid values for the boolean
input fields are "true" and "false".
In other words:
scans.[ID].ifThis field allows you to specify a condition for when the scan should be scheduled, even if it would normally be scheduled by some event.
Let's say you want to run a scan on cron, but only if some condition is met.
Let's say that the condition is that some other scan has produced some output.
You can use the if field to specify the condition:
In this example, conditional_scan should not be evaluated on expression.
Expressions are evaluated every time the job is done. For the cron scan, the
trigger is the cron itself.
However, our job should not run if some other job did not produce the output.
The best example I'm using (and why this field is introduced) is related to the
liveness probe. I'm running httpx on subdomains periodically. However, it
doesn't make sense to schedule it if no subdomains are discovered yet.
Therefore, I can create a cron scan for httpx, and skip it if the if
condition is not met yet.
scans.[ID].artifacts[]Artifacts field specifies what files or directories should be uploaded to the server when the scan is successful.
If the scan fails, no files will be uploaded to the server.
If a single upload fails, the successfully uploaded artifacts will be persisted so you can inspect regardless of the result. However, expression context does not include the scans whose state is not succeeded. This is uploaded only in case you have artifacts that other scans don't depend on. Some artifacts might not be uploaded, but the ones that did, might contain useful information.
All artifacts are zipped and uploaded to the server under the artifact[].name.
The compression level is set to 9 to ensure that the size of the artifact is
minimized. The size of the compressed file is counted against your storage
quota, since that is the amount of data it occupies on the server.
Even though you don't have to name your artifacts with .zip suffix, it is
highly recommended to do so, since it indicates that the artifact is a zip file.
I have had issues downloading the output files without the .zip suffix on
MacOS, since the OS didn't recognize the file as a zip file.
::alert{type="warning"} All artifact paths must be present in the root directory of your scan.
There are multiple reasons for this:
cp files from your local machine to the working directory of
your job, but you would have to do it explicitly.cp command inside the run step to copy files to anywhere on your
filesystem. However, this is not recommended, since it breaks the portability
of the workflow.::
Let's take a look at the simple artifact example:
scans.[ID].artifacts[].nameThe name field specifies the name of the artifact. This is the name under
which the artifact will be stored on the server.
The name must be unique for each scan, and it must not contain any special characters.
The reason why artifacts is not an object (or map) is because it would be ugly having keys named "artifact.zip" or "output.zip". Therefore, to avoid having to quote keys, using an array feels more appropriate.
In order to avoid issues downloading the output directly from the platform, it
would be best if you add the .zip suffix to the artifact name. However, you
don't have to do it.
The name of the artifact is the only reference used for downloading the artifact.
scans.[ID].artifacts[].paths[]Paths specifies a list of paths to files and/or directories that will be included in the artifact.
Currently, you cannot use glob paths, but rather prefix paths for directories
(e.g. outdir/), or direct file names.
Each path must start from the runners working directory where the scan is executed. For example, if you have a run step that does:
The file will be created under the directory of the scan, and the correct
reference for the path field would be file.txt.
scans.[ID].artifacts[].ifThis field allows you to specify a condition for when the artifact should be included in the scan.
The condition is expressed as a CEL expression, just like if or expr fields.
It must evaluate to a boolean value. String values are not treated as valid expression (there are no "truthy" or "falsy" values).
This is especially important in situations where a conditional step produces the file.
The condition to run the step should be the same as the condition to upload the artifact.
Therefore, if step is skipped, the artifact will not be uploaded.
scans.[ID].artifacts[].expires_inExpires in allows you to specify a duration after which the artifact should be removed.
This field is especially useful in case you run scans that produce large artifacts, such as screenshots. You most likely don't want to keep all those artifacts forever, especially since they consume your storage quota.
The duration is expressed as a string, and it must be a valid duration format,
such as 1h, 30m, or 1d. You would specify it as a string, such as
expires_in: "1d".
Valid time units are:
| Suffix | Description |
|---|---|
| m | minutes |
| h | hours |
| d | days |
You can combine them together, such as 1d12h30m for 1 day, 12 hours, and 30
minutes.
It is worth noting that the execution of the cleanups are not exact, i.e. there might be a delay between the exact specified time you want your scan to expire, and the time the background process picks it up.
Therefore, please do not rely on the exact timing of the expiration, but rather on the fact that it will be cleaned up eventually.
scans.[ID].artifacts[].notify_onA field that specifies when the notification (if configured) should be issued after job finishes.
If no project notifications are configured, you will not receive any notifications, regardless of the value of this field.
The type of the notify_on field is string, and the valid values are:
"diff": Notify when the job has a diff compared to the previous job. The
value of the diff is calculated by hashing the contents of every file uploaded
in the artifact. If the hash is different from the previous job's artifact
hash, the notification is issued. This is important to mention because if the
order of the lines is different, it will produce a different hash. Keep that
in mind. The platform does not assume anything about your file contents, and
that includes the format."always": Notify when the job is executed, regardless of the diff."never": Notification will never be issued.Sometimes, you cannot rely on nicely ordering your output. For example, nmap
when XML output is used can sometimes be difficult to sort nicely.
You have 2 options then:
Using two artifacts allows you to leverage native notification mechanism of the
platform, but at the expense of increased storage (although you can expire the
normalized artifact after an hour or so). On the other hand, using tools like
notify is a bit harder to set up, and requires you to pass the required data
through the workflow.
scans.[ID].envEnv field on the scan specifies environment variables that will be available during every step of the scan. These environment variables will override the environment variables set on the runner.
You can expose environment variables using 3 main mechanisms:
run call, or during service install call.export VAR_NAME=${{ secrets.BOUNTYHUB_TOKEN }}.The best way in almost all cases is to use the scan-level environment variables. This ensures that the environment variable is available during every step of the scan, and it is not tied to a specific step.
It is also a top level field, so it is easy to find and manage.
In this example, we will use expression to set the environment variable on the scan:
scans.[ID].steps[]Every job is composed of one or more steps, executed sequentially. It can be as simple as a single step running the script, or it can be a script completely written in a workflow.
For every step, stdout and stderr streams are captured and sent to the
server. These streams are stored for later inspection.
Do not rely on the stdout and stderr streams. The output will expire after
2 weeks, and you will be left with no data. Use artifacts to store important data,
and use logs to troubleshoot the workflow if necessary.
If the step fails, the job fails.
Failure of the step is based on the exit code of the script.
Currently, the stream is propagated to the UI while the job is being executed.
scans.[ID].steps[].ifSteps may conditionally be skipped. This is what I refer to as conditional steps, since they are executed only if certain condition is met.
Let's try to illustrate this idea with an example.
This idea has already been demonstrated in the expr section,
particularly in the subdomaindiscovery scan.
Since the subdomaindiscovery scan can be scheduled when either by
assetfinder or subfinder is done, we need to conditionally download the
results. Otherwise, our scan would fail if one of the scans is not available.
Therefore, we use the if field to conditionally execute the step.
On the other hand, if you want to run step regardless of the fact that the job
will fail, you can use if: always expression to run the step.
scans.[ID].steps[].runRun specifies a script to be executed. The content of this field is first
written to a temporary file, and then executed by the shell specified in the
shell field.
The script is written to a temporary file with random name, inside the working directory of the scan.
The working directory of a scan is created inside the workdir of the runner,
and is unique per job.
There may be situations where workflow syntax
looks valid, but the server fails to parse it. The most surprising YAML parsing
issue I have found is related to a : character inside a single line run
field.
Unless quoted, the parser would throw an error. Therefore, the following is invalid:
What I do in most cases is to use multi-line string syntax, even for single line scripts:
Keep in mind, this has nothing to do with the workflow syntax itself, but rather with the YAML parser used on the server side. I don't have time to re-implement a new parser, so please be aware of this limitation.
scans.[ID].steps[].shellShell specifies the executable that is going to execute the script.
The way runner evaluates the shell is by running shlex split.
Shlex will try to separate arguments of the shell based on how shell would evaluate positional arguments.
Then, the first argument is used as the entrypoint command, rest are used as arguments to the command, and file is concatenated at the end of the arguments.
Let's demonstrate this with an example:
This should be equivalent as the following:
scans.[ID].steps[].allow_failureEach step contains information about its state and outcome.
The state contains the information about whether the step executed successfully or not. The outcome contains the information about whether the step is considered successful or not.
By default, outcome is derived from the state. If the step executed
successfully, the outcome is succeeded. If the step failed to execute (i.e.
non-zero exit code), the outcome is failed.
However, by setting allow_failure: true, you can change this behavior.
When allow_failure: true is set, the outcome of the step will always be
succeeded, regardless of the state.
Using allow_failure: true is useful in situations where you want to run a
step, but you don't want the job to fail if the step fails. For example, you
might want to notify an external service when the job is completed, but you
don't want the job to fail if the notification fails.
Learn more about specific workflow components:
Currently Reading
Workflow Syntax Reference