CCQ Command Line Arguments
ccqsub Command and Arguments
CCQ is a wrapper for common schedulers such as Torque and Slurm. It gives the schedulers autoscaling capability within GCP. When a user submits a job using this script, the job script will be parsed, the number of instances needed determined, and the instances will be launched for the job at run-time. Commands that are specified in the job script take precedence over the options specified on the command line. These commands can be denoted inside of the jobscript itself by using the #CC directive.
The commands that are available through the #CC directive are -it (instance type), -nt (network type), -ni (number of instances requested), -op (optimization type), -p (criteria priority), -cpu (number of CPUs requested), -mem (amount of memory in MB requested), -s (scheduler to use), -st (scheduler type), -jn (job name), -vt (volume type), -up (use preemptible).
Each command must appear on its own line with the #CC directive.
CCQ also understands several #PBS and #SBATCH directives. These will control instance type if it is not explicitly requested. CCQ will ignore any line which contains directives which it does not understand, but they will be passed through to the underlying scheduler.
The output files from the job will appear in the CloudyCluster user’s home directory on the instance the job was submitted from by default if you are using a CloudyCluster instance to submit your files. However there may be a minute or two delay between when the job finishes and when the files appear on the machine the job was submitted from due to extra processing needed.
If you submit from a host outside of CloudyCluster the output files will be stored in the CloudyCluster user’s home directory on the Login Instance associated with the Cluster that the job was submitted to. This behavior can be changed by specifying the -o and -e PBS directives in your job scripts or using the -o and -e ccqsub command line arguments. These tell the scheduler to output the files to the directories specified inside of the job script instead of the defaults. If an output file is missing check the /opt/CloudyCluster/ccqsub/undeliveredJobOutput/{CloudyCluster_user_name} directory on the Scheduler the job was submitted on as this is the default directory for undeliverable job output files.
Billing
All instances started will be given the label ccuser with the present username as the value. The label ccbilling can be added with the -pj option. These labels are intended to track the amount spent on a per user or per project basis. If ccbilling is added, the label ccbillinguser will also be added with the value project-user where project is the name of the project and user is the name of the user.
CCQ Submit
This is the list of possible options that can be used with ccqsub. The CCQ Directive may be used with CCQ in a job script.
- usage: | ccqsub
-
Optional Arguments: -h, –help show this help message and exit -V show program’s version number and exit -i APPKEY The path to the file containing the app key for use when validating the user on the requested resources. -ru REMOTEUSERNAME The remote username to run the job as. This parameter only applies when app keys are used. If the app key belongs to a Proxy user, the remote user name is the username that the job should run as. If the app key does not belong to a Proxy user, then the job is run as the user who the app key belongs too. This argument cannot be used without specifying the -i argument as well. -js <job_script_location> The path to the job script file that you want to submit to the Scheduler/Target. -jn <job_name> The name of the job that will be saved so you can resubmit the job later without having to resubmit the job script itself. -ni <number_instances> The number of instances that you want the job to run on. The default setting is 1 instance. -cpu <cpu_count> The number of CPUs that you want the job to run on. The default setting is 1 CPU. -mem <mem_size_in_MB> The amount of memory (in MB) per instance. The default setting is 1000 MB (1GB) per instance. -s <name_of_scheduler/target_to_use> Specifies the name of the Scheduler/Target that you want to use. The default value is to use the default Scheduler/Target for the Scheduler/Target type you have requested. This default variable can be set using the ccq.config file with the variable defaultScheduler=. -st Torque / SLURM default Specifies the type of Scheduler/Target that you want to use. The accepted values are Torque and SLURM. If the Scheduler/Target type is not specified with a job script then ccqsub will attempt to figure out from the job script what type of Scheduler/Target the job is to be run on. If no job script is submitted then the value will default to the default Scheduler/Target for the Cluster. -op cost/performance Specifies whether to use the instance type that is most cost effective or one that will give better performance regardless of cost. The default is “cost”. -p mcn / mnc / cmn / cnm / ncm / nmc Specifies the priority that is considered when calculating the appropriate instance type for the job. Where m = memory, n = network, and c = cpu. For example specifying “-p ncm” would mean that when calculating the instance type the priority is Network requirements, Cpu requirements, then Memory requirements. This means that Networking is considered first, then the number of Cps, then the amount of memory when choosing an instance type. The default is “mcn” or Memory, Cpus, and then Network. -cl <days_for_login_cert_to_be_valid_for> Specifies the number of days that the generated ccq login certificate is valid for. This certificate is used so that you do not have to enter your username/password combination each time you submit a job. The default is 1 day, and the value must be an integer greater than or equal to 0. Setting the certificate valid length to 0 will disable the generation of login certificates. If the certLength variable is set in the ccq.config file then the value in the ccq.config file will override the value entered via the commandline. -pr Specifies that ccq should print the estimated price for a specific job script but not run the job. No resources will be launched and the estimated price of the job will be shown. This only includes in the instance costs per hour. -o <stdout_file_location> The path to the file where you want the Standard Output from your job to be written too. The default location is the submitting user’s home directory with the name of the file the job name combined with the job id on the machine where the job was submitted. -e <stderr_file_location> The path to the file where you want the Standard Error from your job to be written too. The default location is the submitting user’s home directory with the name of the file the job name combined with the job id on the machine where the job was submitted. -ti Specifies that ccq should immediately terminate the instances created by the CCQ job as soon as the ccq job has completed and not to wait to see if they can be used for other jobs. This argument only applies if the job creates a new compute group. If the job re-uses existing instances they will not be terminated upon job completion. -ps Specifies that ccq should skip the Provisioning stage where it checks to make sure the job’s user is on the Compute Nodes before continuing. This may be desired if the users are already baked into the Image. If this option is given and the users are not on the Image the job could fail. -si true false -tl «days»:«hours»:«minutes» Specifies the amount of time that the job is allowed to run before CCQ will automatically terminate all the instances. If the job completes successfully within the time limit then the instances will be deleted via the CCQ auto-delete process. The default value is that there is not a time limit and the job will run for as long as it needs to. The format to specify a time limit is: «days»: «hours»: «minutes», this is the amount of time from the initial processing of the job that CCQ will let the job run. You may also specify “unlimited” if you do not want the instances to terminate until you delete them. -cp Specifies that this CCQ job should only create placeholder/parent instances and not actually submit a job to the HPC Scheduler. This allows for the compute instances to be created dynamically and remain running as long as the specified time limit. The use of this argument requires the -tl argument as well. The default value is False. -mi <maximum_idle_time> Specifies the maximum amount of time that the instances created by the job should remain running if no jobs are running on the instances. The maximum idle time is specified in terms of minutes and the default is 5. -bi Specifies the ccbilling label that will be associated with created instances (GCP only). -gcpit <instance_type> Specifies the GCP Compute Engine Instance type that the job is to be run on. If no instance type is specified, then the amount of RAM and CPUs will be used to determine an appropriate GCP Compute Engine Instance type. CCQ also supports the creation of custom instance types, these can be specified using the following format: custom-CPUS-MEMORY. A default instance type can be set using the “defaultInstanceType” directive in the ccq Config file. -gcpgi <google_image_id> Specifies the Google Image that CCQ should use to launch the Compute Instances for the job. This MUST be a Google Image that contains the CloudyCluster software or IT WILL NOT WORK. If no Google Image is specified then the CloudyCluster Google Image the Scheduler Instance is using will be used. -gcpvt pd-standard pd-ssd -gcpup Specifies that the Compute Instances should launch as Preemptible instances. -gcpmt Specifies that CCQ should maintain the number of instances if a preemptible instance assigned to the job gets preempted. -gcput1 Specifies to use Google’s Tier 1 networking bandwidth. For information on Tier 1 bandwidth, see the documentation.
GPU with ccqsub
CloudyCluster allows you to leverage the powser of GPU processing speed by calling the GPU compute node configuration when submitting your job to the scheduler.
-
Option Description -gcpgpu Specifies that the Compute Instances should launch with an attached GPU. -gcpcpuarch <Minimum_CPU_Platform> Specifies the minimum CPU Platform that the instance should utilize. The format for supported CPU platforms is to utilize a “-” instead of the " " character in the available CPU Platforms. ex: “Intel-Skylake” or “Intel-Ivy-Bridge” -gcpgpusp «number_of_GPUs_per_node>:<type_of_GPU» Specifies the number of GPUs per instance and the type of GPU to be attached to the Compute Instances at launch. An example configuration that would launch Compute Instances with 2 NVIDIA Tesla P4 GPUs would be “2:nvidia-tesla-p4”. NOTE: Supported GPU types are listed at https://cloud.google.com/compute/docs/gpus. The instance will fail to launch if an invalid (not supported in region or not supported with instance type) GPU type is specified.