# BountyHub Workflows

Workflows are the way you describe automation in BountyHub. They should describe the way you do stuff in a concise and declarative way.

Workflows act as if there is a replica of you doing the automated tasks continuously, feeding you information in real-time, allowing you to do the best work you possibly can.

In bug bounty hunting, being the first one to find a bug is crucial. The faster you can find a bug, the higher the chances of you being the first one to report it, and get the bounty. So use workflows to your advantage, and let them work 24/7 for you.

# Workflow Concepts

Let's discuss concepts that are important to understand when working with workflows.

# Workflow

A workflow is a single unit of automation. It describes the flow you take manually when hunting for bugs. The idea behind the workflow is to combine multiple steps that depend on each other into a live execution model that runs while you sleep.

Most automation tools out there are working in terms of pipelines, where you define a series of steps that are executed in order. Execution is broken into stages, and each stage can depend on the output of the previous stage.

The workflows are different. Workflows are continuous processes that are always running, always adapting to the latest data available. You define how you want to work, and the workflow continuously does that for you.

Let's illustrate the difference with an example:

Loading diagram...

Pipeline would start with Stage #1, and only when it is finished, it would proceed to Stage #2 and Stage #4. Only when both of them are finished,it would proceed to Stage #3, and so on.

Workflow, on the other hand, depending on the trigger specified in the scan, would start with Stage #1, and as soon as it is finished, it would enable Stage #2 and Stage #4 to run. Maybe, stage #4 runs on a schedule, so it won't start right away.

However, it still depends on the output of Stage #1, so it won't start until Stage #1 is finished.

Then Stage #4 finished, and Stage #3 can start, as both Stage #2 and Stage #4 are finished.

Let's say that stage #4 executes again because of the schedule, then Stage #3 would execute again, using the latest data from Stage #2 and Stage.

As you can see, workflows are continuous processes that adapt to the latest data available. The outputs are the driving factors, not the tools invoked to find the output.

The clearest example to me is running liveness probe using httpx to see what subdomains are available. We don't need to re-run subdomain enumeration every time we want to see what subdomains are live. We can run subdomain enumeration once every few days, but keep running the liveness probe every 6-12 hours, to see what subdomains are live right now.

Then, we can be the first one knowing about a new live subdomain, and start hunting on it.

Workflow is defined in a YAML file, following the workflow syntax specification.

There are multiple reasons why YAML is a great fit for defining workflows:

  • Human-readable: YAML is designed to be easy to read and write for humans.
  • Compressed syntax: All tools that I've seen that use GUI to specify workflows end up taking too much space on the screen. With GUI, it is difficult to have a compact view of the entire workflow, or even to view a single scan without especially if the scan has multiple non-trivial steps.
  • Hierarchical structure: YAML's indentation-based structure allows for clear representation of nested data, which is useful for defining complex workflows.
  • Widely adopted: YAML is widely used in various applications, including configuration files, making it a familiar choice for many developers and DevOps professionals.
  • Supports complex data types: YAML can represent complex data structures, such as lists and dictionaries, which are often needed in workflow definitions.

Changes to the workflow are expected. As we learn more, we add new tools, remove others, modify the logic, etc. Some of these changes should stay dynamic, while others impact other parts of the workflow.

Therefore, each workflow update creates a revision.

# Workflow Revision

Each time you create or update a workflow, a new revision is created.

Each revision is fully independent of other revisions. While you can have multiple revisions of the file, but only the latest revision is active.

When new revision is created, new schedule is created for it. That means that the scans of the previous revision will not be executed anymore, and that the scheduled scans from the previous revision are removed.

The history, however, is preserved, so you can always go back and see what was done in the previous revisions.

# Scan

Scan is a single unit of work in the workflow. It describes one stage of what you want to do. For example, a scan can describe subdomain enumeration, port scanning, vulnerability scanning, etc. Each of them is a single unit of work, having clearly defined inputs and outputs.

While you can certainly combine multiple tools in a single scan, it is usually better to have a single tool per scan. This allows better reusability of the scans, as well as better visibility into what is going on.

When describing a scan, take the time to think deeply what exactly would you like to achieve with it, and when exactly you would like it to be executed.

Each scan instantiates a job, based on the trigger you define. Learn more about different trigger mechanisms on the syntax reference page.

# Job

Job is an instance of the scan. Each scan can have zero or more jobs, but each job belongs to exactly one scan.

Each job is composed of multiple steps, executed in sequence. Step failure results in job failure, unless the allow_failure: true is specified. Read more about allow_failure on the syntax reference page.

The job data is evaluated during the scheduling. The runner keeps asking for jobs. Once the job is available, it gets assigned to the runner. At that time, all conditions are checked, template expressions are evaluated, and sent to the runner.

Once the runner starts running the job, it will periodically update the BountyHub about its execution, stream the stdout and stderr data, and, if successful, upload artifacts created by the job.

Each artifact belongs to a job. If you remove the job, all logs, artifacts, and metadata associated with the job will be removed.

# Artifact

Artifact is a file or a directory created by the job. Each job can upload multiple artifacts, each with its own name. Artifacts are stored on the back-end service, and can be downloaded later.

Each artifact is a zip directory, with compression level set to 9. This ensures that the artifact size is minimized, so it doesn't waste space on the server.

However, it also means that downloading the artifact requires decompression. Use unzip or similar tools to extract the contents of the artifact.

Since artifacts are the most important results of each job, you can configure notifications on artifact changes. Depending on the notification settings, you can choose to get notified on diff, always, or never.

If you choose diff, you will be notified only when the artifact content changes compared to the previous successful job of the same scan.

If you choose always, you will be notified every time the artifact is uploaded successfully.

The never value specifies that you don't want to be notified about artifact.

To lean more about artifacts, visit the Syntax Reference page

# Runner

Runner is an agent running the workflow. Once registered, it keeps the connection with the back-end service, making it eligible for scheduling.

Since the runner is an agent that you run on your machine, it is important to know exactly what is running. Therefore, the runner itself is an open-source project that you can find on GitHub.

The correct way to think about the runner is a worker that:

  • Connects to the back-end service.
  • Asks for jobs to run.
  • Runs the jobs.
  • Reports the results back to the back-end service.
  • Uploads artifacts created by the jobs.
  • Repeats the process.

You can run multiple runners, each connecting to the same back-end service. You can also configure parallelism on the runner, allowing it to run multiple jobs simultaneously.

Currently, there is no mechanism to assign job to a specific runner or a pool of runners. I didn't find the need to do that just yet.

Therefore, running multiple runners or increasing parallelism on the runner could lead to multiple scans, associated with the same target, running simultaneously.

It is up to you to ensure that the traffic is either according to the acceptable use policy of the target, or that the target can handle the load. If necessary, use proxies to ensure that your IP address is not blocked due to excessive load.

Always respect the rules of engagement and acceptable use policy of the target.

Read more about the runner on the runner documentation page.

# Summary: Key Features

# Automated and Manual Triggers

Workflows can be triggered by:

  • Cron schedules - Run scans periodically
  • Expression evaluations - Trigger based on conditions from other scans
  • Manual dispatch - Run scans on-demand with custom inputs

Each job can be re-triggered. Since the nature of workflows is to run on the latest available data, each manual trigger schedules a job, as if it is scheduled automatically. Each scheduled job resolves variables at the time of it being assigned, not when the workflow is created or updated.

Learn more about triggers on the scans.ID.on page.

# Artifact Management

Primary mechanism for storing and sharing outputs between scans:

  • Upload files and directories after successful scans.
  • Download artifacts from previous scans.
  • Automatic compression and storage.
  • Optional expiration dates for storage management.
  • Optional notification management on artifact changes.

# Conditional Execution

Control when steps and scans run:

  • Conditional steps using if expressions.
  • Dependency management between scans.
  • Skip unnecessary work based on scan results.

# Dynamic execution

To avoid modifications of the workflow on target/environment specific changes, you can use:

  • Project Variables - Define variables at the project level to be used across workflows.
  • Project Secrets - Securely store sensitive information like API keys and passwords.

Environment variables can be defined on the scan, where the value of the environment variable uses template evaluation, referencing project variables and secrets.

Since environment variables could change the execution of a scan, each environment variable must be defined at the scan level.

# Next Steps

Learn more about specific workflow components: