wfcommons.generator

wfcommons.generator.generator

class wfcommons.generator.generator.WorkflowGenerator(workflow_recipe: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe, logger: Optional[logging.Logger] = None)

Bases: object

A generator of synthetic workflow traces based on workflow recipes obtained from the analysis of real workflow execution traces.

Parameters:
  • workflow_recipe (WorkflowRecipe) – The workflow recipe to be used for this generator.
  • logger (Logger) – The logger where to log information/warning or errors (optional).
build_workflow(workflow_name: Optional[str] = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace based on the workflow recipe used to instantiate the generator.

Parameters:workflow_name (str) – The workflow name.
Returns:A synthetic workflow trace object.
Return type:Workflow
build_workflows(num_workflows: int) → List[wfcommons.common.workflow.Workflow]

Generate a number of synthetic workflow traces based on the workflow recipe used to instantiate the generator.

Parameters:num_workflows (int) – The number of workflows to be generated.
Returns:A list of synthetic workflow trace objects.
Return type:List[Workflow]

wfcommons.generator.workflow.blast_recipe

class wfcommons.generator.workflow.blast_recipe.BLASTRecipe(num_subsample: Optional[int] = 2, data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 5, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0)

Bases: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe

A BLAST workflow recipe class for creating synthetic workflow traces.

Parameters:
  • num_subsample (int) – The number of subsample the reference file will be split.
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
build_workflow(workflow_name: Optional[str] = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace of a BLAST workflow.

Parameters:workflow_name (int) – The workflow name
Returns:A synthetic workflow trace object.
Return type:Workflow
classmethod from_num_subsample(num_subsample: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.blast_recipe.BLASTRecipe

Instantiate a BLAST workflow recipe that will generate synthetic workflows using the defined number of subsample.

Parameters:
  • num_subsample (int) – The number of subsample the reference file will be split.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A BLAST workflow recipe object that will generate synthetic workflows using the defined number of subsample.

Return type:

BLASTRecipe

classmethod from_num_tasks(num_tasks: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.blast_recipe.BLASTRecipe

Instantiate a BLAST workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow (at least 5).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A BLAST workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

BLASTRecipe

wfcommons.generator.workflow.bwa_recipe

class wfcommons.generator.workflow.bwa_recipe.BWARecipe(num_subsample: Optional[int] = 2, data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 5, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0)

Bases: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe

A BLAST workflow recipe class for creating synthetic workflow traces.

Parameters:
  • num_subsample (int) – The number of subsample the reference file will be split.
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
build_workflow(workflow_name: Optional[str] = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace of a BWA workflow.

Parameters:workflow_name (int) – The workflow name
Returns:A synthetic workflow trace object.
Return type:Workflow
classmethod from_num_subsample(num_subsample: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.bwa_recipe.BWARecipe

Instantiate a BWA workflow recipe that will generate synthetic workflows using the defined number of subsample.

Parameters:
  • num_subsample (int) – The number of subsample the reference file will be split.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A BWA workflow recipe object that will generate synthetic workflows using the defined number of subsample.

Return type:

BWARecipe

classmethod from_num_tasks(num_tasks: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.bwa_recipe.BWARecipe

Instantiate a BWA workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow (at least 6).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A BWA workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

BWARecipe

wfcommons.generator.workflow.cycles_recipe

class wfcommons.generator.workflow.cycles_recipe.CyclesRecipe(num_points: Optional[int] = 1, num_crops: Optional[int] = 1, num_params: Optional[int] = 4, data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 7, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0)

Bases: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe

A Cycles workflow recipe class for creating synthetic workflow traces.

Parameters:
  • num_points (int) – The number of points of the spatial grid cell.
  • num_crops (int) – The number of crops being evaluated.
  • num_params (int) – The number of parameter values from the simulation matrix.
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
build_workflow(workflow_name: Optional[str] = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace of a Cycles workflow.

Parameters:workflow_name (int) – The workflow name
Returns:A synthetic workflow trace object.
Return type:Workflow
classmethod from_num_tasks(num_tasks: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.cycles_recipe.CyclesRecipe

Instantiate a Cycles workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow (at least 7).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A Cycles workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

CyclesRecipe

classmethod from_points_and_crops(num_points: int, num_crops: int, num_params: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.cycles_recipe.CyclesRecipe

Instantiate a Cycles workflow recipe that will generate synthetic workflows using the defined number of points, crops, and params.

Parameters:
  • num_points (int) – The number of points of the spatial grid cell.
  • num_crops (int) – The number of crops being evaluated.
  • num_params (int) – The number of parameter values from the simulation matrix.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A Cycles workflow recipe object that will generate synthetic workflows using the defined number of points, crops, and params.

Return type:

CyclesRecipe

wfcommons.generator.workflow.epigenomics_recipe

class wfcommons.generator.workflow.epigenomics_recipe.EpigenomicsRecipe(num_sequence_files: Optional[int] = 1, num_lines: Optional[int] = 10, bin_size: Optional[int] = 10, data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 9, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0)

Bases: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe

An Epigenomics workflow recipe class for creating synthetic workflow traces.

Parameters:
  • num_sequence_files (int) – Number of FASTQ files processed by the workflow.
  • num_lines (int) – Number of lines in each FASTQ file.
  • bin_size (int) – Number of DNA and protein sequence information to be processed by each computational task.
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
build_workflow(workflow_name: str = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace of an Epigenomics workflow.

Parameters:workflow_name (int) – The workflow name
Returns:A synthetic workflow trace object.
Return type:Workflow
classmethod from_num_tasks(num_tasks: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.epigenomics_recipe.EpigenomicsRecipe

Instantiate an Epigenomics workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow (at least 9).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

An Epigenomics workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

EpigenomicsRecipe

classmethod from_sequences(num_sequence_files: int, num_lines: int, bin_size: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.epigenomics_recipe.EpigenomicsRecipe

Instantiate an Epigenomics workflow recipe that will generate synthetic workflows using the defined number of sequence files, lines, and bin size.

Parameters:
  • num_sequence_files (int) – Number of FASTQ files processed by the workflow.
  • num_lines (int) – Number of lines in each FASTQ file.
  • bin_size (int) – Number of DNA and protein sequence information to be processed by each computational task.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

An Epigenomics workflow recipe object that will generate synthetic workflows using the defined number of sequence files, lines, and bin size.

Return type:

EpigenomicsRecipe

wfcommons.generator.workflow.genome_recipe

class wfcommons.generator.workflow.genome_recipe.GenomeRecipe(num_chromosomes: Optional[int] = 1, num_sequences: Optional[int] = 1, num_populations: Optional[int] = 1, data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 5, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0)

Bases: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe

A 1000Genome workflow recipe class for creating synthetic workflow traces.

Parameters:
  • num_chromosomes (int) – The number of chromosomes evaluated in the workflow execution.
  • num_sequences (int) – The number of sequences per chromosome file.
  • num_populations (int) – The number of populations being evaluated.
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
build_workflow(workflow_name: str = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace of a 1000Genome workflow.

Parameters:workflow_name (int) – The workflow name
Returns:A synthetic workflow trace object.
Return type:Workflow
classmethod from_num_chromosomes(num_chromosomes: int, num_sequences: int, num_populations: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.genome_recipe.GenomeRecipe

Instantiate a 1000Genome workflow recipe that will generate synthetic workflows using the defined number of chromosomes, sequences, and populations.

Parameters:
  • num_chromosomes (int) – The number of chromosomes evaluated in the workflow execution.
  • num_sequences (int) – The number of sequences per chromosome file.
  • num_populations (int) – The number of populations being evaluated.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A 1000Genome workflow recipe object that will generate synthetic workflows using the defined number of chromosomes, sequences, and populations.

Return type:

GenomeRecipe

classmethod from_num_tasks(num_tasks: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.genome_recipe.GenomeRecipe

Instantiate a 1000Genome workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow (at least 5).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A 1000Genome workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

GenomeRecipe

wfcommons.generator.workflow.montage_recipe

class wfcommons.generator.workflow.montage_recipe.MontageDataset

Bases: wfcommons.utils.NoValue

An enumeration of Montage datasets.

DSS = 'dss'
TWOMASS = '2mass'
class wfcommons.generator.workflow.montage_recipe.MontageRecipe(dataset: Optional[wfcommons.generator.workflow.montage_recipe.MontageDataset] = <MontageDataset.DSS>, num_bands: Optional[int] = 1, degree: Optional[float] = 0.5, data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 133, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0)

Bases: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe, wfcommons.generator.workflow.montage_recipe._MontagetaskRatios

A Montage workflow recipe class for creating synthetic workflow traces. In this workflow recipe, traces will follow different recipes for different MontageDataset.

Parameters:
  • dataset (MontageDataset) – The dataset to use for the mosaic (e.g., 2mass, dss).
  • num_bands (int) – The number of bands (e.g., red, blue, and green) used by the workflow.
  • degree (float) – The size (in degrees) to be used for the width/height of the final mosaic.
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
build_workflow(workflow_name: str = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace of a Montage workflow.

Parameters:workflow_name (int) – The workflow name
Returns:A synthetic workflow trace object.
Return type:Workflow
classmethod from_degree(dataset: wfcommons.generator.workflow.montage_recipe.MontageDataset, num_bands: int, degree: float, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.montage_recipe.MontageRecipe

Instantiate a Montage workflow recipe that will generate synthetic workflows using the defined dataset, number of bands, and degree.

Parameters:
  • dataset (MontageDataset) – The dataset to use for the mosaic (e.g., 2mass, dss).
  • num_bands (int) – The number of bands (e.g., red, blue, and green) used by the workflow (at least 1).
  • degree (float) – The size (in degrees) to be used for the width/height of the final mosaic (at least 0.5).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A Montage workflow recipe object that will generate synthetic workflows using the defined dataset, number of bands, and degree.

Return type:

MontageRecipe

classmethod from_num_tasks(num_tasks: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.montage_recipe.MontageRecipe

Instantiate a Montage workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow (at least 133).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A Montage workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

MontageRecipe

wfcommons.generator.workflow.seismology_recipe

class wfcommons.generator.workflow.seismology_recipe.SeismologyRecipe(num_pairs: Optional[int] = 2, data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 3, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0)

Bases: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe

A Seismology workflow recipe class for creating synthetic workflow traces.

Parameters:
  • num_pairs (int) – The number of pair of signals to estimate earthquake STFs.
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
build_workflow(workflow_name: Optional[str] = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace of a Seismology workflow.

Parameters:workflow_name (int) – The workflow name
Returns:A synthetic workflow trace object.
Return type:Workflow
classmethod from_num_pairs(num_pairs: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.seismology_recipe.SeismologyRecipe

Instantiate a Seismology workflow recipe that will generate synthetic workflows using the defined number of pairs.

Parameters:
  • num_pairs (int) – The number of pair of signals to estimate earthquake STFs (at least 2).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A Seismology workflow recipe object that will generate synthetic workflows using the defined number of pairs.

Return type:

SeismologyRecipe

classmethod from_num_tasks(num_tasks: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.seismology_recipe.SeismologyRecipe

Instantiate a Seismology workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow (at least 3).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A Seismology workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

SeismologyRecipe

wfcommons.generator.workflow.soykb_recipe

class wfcommons.generator.workflow.soykb_recipe.SoyKBRecipe(num_fastq_files: Optional[int] = 2, num_chromosomes: Optional[int] = 1, data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 14, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0)

Bases: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe

A SoyKB workflow recipe class for creating synthetic workflow traces.

Parameters:
  • num_fastq_files (int) – The number of FASTQ files to be analyzed.
  • num_chromosomes (int) – The number of chromosomes.
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
build_workflow(workflow_name: Optional[str] = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace of a SoyKB workflow.

Parameters:workflow_name (int) – The workflow name
Returns:A synthetic workflow trace object.
Return type:Workflow
classmethod from_num_tasks(num_tasks: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.soykb_recipe.SoyKBRecipe

Instantiate a SoyKB workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow (at least 14).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A SoyKB workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

SoyKBRecipe

classmethod from_sequences(num_fastq_files: int, num_chromosomes: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.soykb_recipe.SoyKBRecipe

Instantiate a SoyKB workflow recipe that will generate synthetic workflows using the defined number of FASTQ files and chromosomes.

Parameters:
  • num_fastq_files (int) – The number of FASTQ files to be analyzed (at least 2).
  • num_chromosomes (int) – The number of chromosomes (range [1,22].
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

A SoyKB workflow recipe object that will generate synthetic workflows using the defined number of FASTQ files and chromosomes.

Return type:

SoyKBRecipe

wfcommons.generator.workflow.srasearch_recipe

class wfcommons.generator.workflow.srasearch_recipe.SRASearchRecipe(num_accession: Optional[int] = 2, data_footprint: Optional[int] = 0, num_tasks: Optional[int] = 3, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0)

Bases: wfcommons.generator.workflow.abstract_recipe.WorkflowRecipe

An SRA Search workflow recipe class for creating synthetic workflow traces.

Parameters:
  • num_accession (int) – The number of NCBI accession numbers.
  • data_footprint (int) – The upper bound for the workflow total data footprint (in bytes).
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
build_workflow(workflow_name: Optional[str] = None) → wfcommons.common.workflow.Workflow

Generate a synthetic workflow trace of an SRA Search workflow.

Parameters:workflow_name (int) – The workflow name
Returns:A synthetic workflow trace object.
Return type:Workflow
classmethod from_num_accession(num_accession: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.srasearch_recipe.SRASearchRecipe

Instantiate an SRA Search workflow recipe that will generate synthetic workflows using the defined number of pairs.

Parameters:
  • num_accession (int) – The number of NCBI accession numbers.
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

An SRA Search workflow recipe object that will generate synthetic workflows using the defined number of pairs.

Return type:

SRASearchRecipe

classmethod from_num_tasks(num_tasks: int, runtime_factor: Optional[float] = 1.0, input_file_size_factor: Optional[float] = 1.0, output_file_size_factor: Optional[float] = 1.0) → wfcommons.generator.workflow.srasearch_recipe.SRASearchRecipe

Instantiate an SRA Search workflow recipe that will generate synthetic workflows up to the total number of tasks provided.

Parameters:
  • num_tasks (int) – The upper bound for the total number of tasks in the workflow (at least 6).
  • runtime_factor (float) – The factor of which tasks runtime will be increased/decreased.
  • input_file_size_factor (float) – The factor of which tasks input files size will be increased/decreased.
  • output_file_size_factor (float) – The factor of which tasks output files size will be increased/decreased.
Returns:

An SRA Search workflow recipe object that will generate synthetic workflows up to the total number of tasks provided.

Return type:

SRASearchRecipe