Workflows are the way you describe automation in BountyHub. They should describe the way you do stuff in a concise and declarative way.
Workflows act as if there is a replica of you doing the automated tasks continuously, feeding you information in real-time, allowing you to do the best work you possibly can.
In bug bounty hunting, being the first one to find a bug is crucial. The faster you can find a bug, the higher the chances of you being the first one to report it, and get the bounty. So use workflows to your advantage, and let them work 24/7 for you.
Let's discuss concepts that are important to understand when working with workflows.
A workflow is a single unit of automation. It describes the flow you take manually when hunting for bugs. The idea behind the workflow is to combine multiple steps that depend on each other into a live execution model that runs while you sleep.
Most automation tools out there are working in terms of pipelines, where you define a series of steps that are executed in order. Execution is broken into stages, and each stage can depend on the output of the previous stage.
The workflows are different. Workflows are continuous processes that are always running, always adapting to the latest data available. You define how you want to work, and the workflow continuously does that for you.
Let's illustrate the difference with an example:
Pipeline would start with Stage #1, and only when it is finished, it would proceed to Stage #2 and Stage #4. Only when both of them are finished,it would proceed to Stage #3, and so on.
Workflow, on the other hand, depending on the trigger specified in the scan, would start with Stage #1, and as soon as it is finished, it would enable Stage #2 and Stage #4 to run. Maybe, stage #4 runs on a schedule, so it won't start right away.
However, it still depends on the output of Stage #1, so it won't start until Stage #1 is finished.
Then Stage #4 finished, and Stage #3 can start, as both Stage #2 and Stage #4 are finished.
Let's say that stage #4 executes again because of the schedule, then Stage #3 would execute again, using the latest data from Stage #2 and Stage.
As you can see, workflows are continuous processes that adapt to the latest data available. The outputs are the driving factors, not the tools invoked to find the output.
The clearest example to me is running liveness probe using httpx to see what
subdomains are available. We don't need to re-run subdomain enumeration every
time we want to see what subdomains are live. We can run subdomain enumeration
once every few days, but keep running the liveness probe every 6-12 hours, to
see what subdomains are live right now.
Then, we can be the first one knowing about a new live subdomain, and start hunting on it.
Workflow is defined in a YAML file, following the workflow syntax specification.
There are multiple reasons why YAML is a great fit for defining workflows:
Changes to the workflow are expected. As we learn more, we add new tools, remove others, modify the logic, etc. Some of these changes should stay dynamic, while others impact other parts of the workflow.
Therefore, each workflow update creates a revision.
Each time you create or update a workflow, a new revision is created.
Each revision is fully independent of other revisions. While you can have multiple revisions of the file, but only the latest revision is active.
When new revision is created, new schedule is created for it. That means that the scans of the previous revision will not be executed anymore, and that the scheduled scans from the previous revision are removed.
The history, however, is preserved, so you can always go back and see what was done in the previous revisions.
Scan is a single unit of work in the workflow. It describes one stage of what you want to do. For example, a scan can describe subdomain enumeration, port scanning, vulnerability scanning, etc. Each of them is a single unit of work, having clearly defined inputs and outputs.
While you can certainly combine multiple tools in a single scan, it is usually better to have a single tool per scan. This allows better reusability of the scans, as well as better visibility into what is going on.
When describing a scan, take the time to think deeply what exactly would you like to achieve with it, and when exactly you would like it to be executed.
Each scan instantiates a job, based on the trigger you define. Learn more about different trigger mechanisms on the syntax reference page.
Job is an instance of the scan. Each scan can have zero or more jobs, but each job belongs to exactly one scan.
Each job is composed of multiple steps, executed in sequence. Step failure
results in job failure, unless the allow_failure: true is specified. Read more
about allow_failure on the
syntax reference page.
The job data is evaluated during the scheduling. The runner keeps asking for jobs. Once the job is available, it gets assigned to the runner. At that time, all conditions are checked, template expressions are evaluated, and sent to the runner.
Once the runner starts running the job, it will periodically update the
BountyHub about its execution, stream the stdout and stderr data, and, if
successful, upload artifacts created by the job.
Each artifact belongs to a job. If you remove the job, all logs, artifacts, and metadata associated with the job will be removed.
Artifact is a file or a directory created by the job. Each job can upload multiple artifacts, each with its own name. Artifacts are stored on the back-end service, and can be downloaded later.
Each artifact is a zip directory, with compression level set to 9. This ensures that the artifact size is minimized, so it doesn't waste space on the server.
However, it also means that downloading the artifact requires decompression. Use
unzip or similar tools to extract the contents of the artifact.
Since artifacts are the most important results of each job, you can configure
notifications on artifact changes. Depending on the notification settings, you
can choose to get notified on diff, always, or never.
If you choose diff, you will be notified only when the artifact content
changes compared to the previous successful job of the same scan.
If you choose always, you will be notified every time the artifact is uploaded
successfully.
The never value specifies that you don't want to be notified about artifact.
To lean more about artifacts, visit the Syntax Reference page
Runner is an agent running the workflow. Once registered, it keeps the connection with the back-end service, making it eligible for scheduling.
Since the runner is an agent that you run on your machine, it is important to know exactly what is running. Therefore, the runner itself is an open-source project that you can find on GitHub.
The correct way to think about the runner is a worker that:
You can run multiple runners, each connecting to the same back-end service. You can also configure parallelism on the runner, allowing it to run multiple jobs simultaneously.
Currently, there is no mechanism to assign job to a specific runner or a pool of runners. I didn't find the need to do that just yet.
Therefore, running multiple runners or increasing parallelism on the runner could lead to multiple scans, associated with the same target, running simultaneously.
It is up to you to ensure that the traffic is either according to the acceptable use policy of the target, or that the target can handle the load. If necessary, use proxies to ensure that your IP address is not blocked due to excessive load.
Always respect the rules of engagement and acceptable use policy of the target.
Read more about the runner on the runner documentation page.
Workflows can be triggered by:
Each job can be re-triggered. Since the nature of workflows is to run on the latest available data, each manual trigger schedules a job, as if it is scheduled automatically. Each scheduled job resolves variables at the time of it being assigned, not when the workflow is created or updated.
Learn more about triggers on the scans.ID.on page.
Primary mechanism for storing and sharing outputs between scans:
Control when steps and scans run:
if expressions.To avoid modifications of the workflow on target/environment specific changes, you can use:
Environment variables can be defined on the scan, where the value of the environment variable uses template evaluation, referencing project variables and secrets.
Since environment variables could change the execution of a scan, each environment variable must be defined at the scan level.
Learn more about specific workflow components:
Currently Reading
Workflows Overview