diff --git a/JobExitCodes.html b/JobExitCodes.html new file mode 100644 index 0000000..31d7b99 --- /dev/null +++ b/JobExitCodes.html @@ -0,0 +1,509 @@ + + + + + + + + + JobExitCodes < CMS < TWiki + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+
+
+
+
+

Error codes currently sent from CMS jobs to the dashboard

+

+ALERT! indicates site error +

    +
  • Error exit code of the cmsRun application itself - range 0-10000 (Exit codes in 1-512 are standard ones in Unix and indicate a CMSSW abort that the cmsRun did not catch as exception)
      +
    • -1 - Error return without specification +
    • 1 - Hangup (POSIX) +
    • 2 - Interrupt (ANSI) +
    • 3 - unknown +
    • 4 - Illegal instruction (ANSI) +
    • 5 - Trace trap (POSIX) +
    • 6 - Abort (ANSI) or IOT trap (4.2BSD) +
    • 7 - BUS error (4.2BSD) +
    • 8 - Floating point exception (ANSI) +
    • 9 - killed, unblockable (POSIX) kill -9 +
    • 10 - User defined +
    • 11 - segmentation violation (ANSI) +
    • 12 - User defined +
    • 15 - Termination (ANSI) +
    • 24 - Soft CPU limit exceeded (4.2 BSD) +
    • 25 - File size limit exceeded (4.2 BSD) +
    • 29 - nondefined +
    • 30 - Power failure restart (System V.) +
    • 33 - nondefined +
    • 64 - I/O error: cannot open data file (SEAL) +
    • 65 - End of job from user application (CMSSW) +
    • 66 - Application exception +
    • 67 - unknown +
    • 68 - unknown +
    • 73 - Failed writing to read-only file system +
    • 84 - Some required file not found; check logs for name of missing file. +
    • 88 - unknown +
    • 90 - Application exception +
    • 100 - nondefined +
    • 126 - unknown +
    • 127 - Error while loading shared library +
    • 129 - Hangup (POSIX) +
    • 132 - Illegal instruction (ANSI) +
    • 133 - Trace trap (POSIX) +
    • 134 - Abort (ANSI) or IOT trap (4.2 BSD) +
    • 135 - Bus error (4.2 BSD) +
    • 136 - unknown +
    • 137 - killed, unblockable (POSIX) kill -9 +
    • 138 - User defined +
    • 139 - Segmentation violation (ANSI) +
    • 140 - User defined +
    • 143 - Termination (ANSI) +
    • 152 - CPU limit exceeded (4.2 BSD) +
    • 153 - File size limit exceeded (4.2 BSD) +
    • 155 - Profiling alarm clock (4.2 BSD) +
    • 251 - nondefined +
    • 255 - nondefined +
    • 256 - Application exception +
    • 512 - nondefined +
    • 2304 - nondefined +
    • 0001 - Plug-in or message service initialization Exception +
    +
+

    +
  • cmsRun (CMSSW) exit codes. These codes may depend on specific CMSSW version, the list is maintained in CVS and you should look at tags there to find out what is appropriate for a given CMSSW release. The situation as of 5_0_X is below
      +
    • // The first three are specific categories of CMS Exceptions. +
    • 7000 - Exception from command line processing +
    • 7001 - Configuration File Not Found +
    • 7002 - Configuration File Read Error +
    • 8001 - Other CMS Exception +
    • 8002 - std::exception (other than bad_alloc) +
    • 8003 - Unknown Exception +
    • 8004 - std::bad_alloc (memory exhaustion) +
    • 8005 - Bad Exception Type (e.g throwing a string) +
    • // The rest are specific categories of CMS Exceptions. +
    • 8006 - ProductNotFound +
    • 8007 - DictionaryNotFound +
    • 8008 - InsertFailure +
    • 8009 - Configuration +
    • 8010 - LogicError +
    • 8011 - UnimplementedFeature +
    • 8012 - InvalidReference +
    • 8013 - NullPointerError +
    • 8014 - NoProductSpecified +
    • 8015 - EventTimeout +
    • 8016 - EventCorruption +
    • 8017 - ScheduleExecutionFailure +
    • 8018 - EventProcessorFailure +
    • 8019 - FileInPathError +
    • ALERT! 8020 - FileOpenError (Likely a site error) +
    • ALERT! 8021 - FileReadError (May be a site error) +
    • 8022 - FatalRootError +
    • 8023 - MismatchedInputFiles +
    • 8024 - ProductDoesNotSupportViews +
    • 8025 - ProductDoesNotSupportPtr +
    • 8026 - NotFound (something other than a product or dictionary not found) +
    • 8027 - FormatIncompatibility +
    • ALERT! 8028 - FileOpenError with fallback +
    +
  • ALERT! Failures related to the environment setup - range 10000-19999
      +
    • ALERT! 10001 - LD_LIBRARY_PATH is not defined +
    • ALERT! 10002 - Failed to setup LCG_LD_LIBRAR_PATH +
    • ALERT! 10016 - OSG $WORKING_DIR could not be created +
    • ALERT! 10017 - OSG $WORKING_DIR could not be deleted +
    • ALERT! 10018 - OSG $WORKING_DIR could not be deleted +
    • ALERT! 10020 - Shell script cmsset_default.sh to setup cms environment is not found +
    • ALERT! 10021 - Failed to scram application project using the afs release area +
    • ALERT! 10022 - Failed to scram application project using CMS sw disribution on the LCG2 +
    • ALERT! 10030 - middleware not identified +
    • ALERT! 10031 - Directory VO_CMS_SW_DIR not found +
    • ALERT! 10032 - Failed to source CMS Environment setup script such as cmssset_default.sh, grid system or site equivalent script +
    • ALERT! 10033 - Platform is incompatible with the scram version +
    • ALERT! 10034 - Required application version is not found at the site +
    • ALERT! 10035 - Scram Project Command Failed +
    • ALERT! 10036 - Scram Runtime Command Failed +
    • ALERT! 10037 - Failed to find cms_site_config file in software area +
    • ALERT! 10038 - Failed to find cms_site_catalogue.sh file in software area +
    • ALERT! 10039 - cms_site_catalogue.sh failed to provide the catalogue +
    • ALERT! 10040 - failed to generate cmsRun cfg file at runtime +
    • ALERT! 10041 - fail to find valid client for output stage out +
    • ALERT! 10042 - Unable to stage-in wrapper tarball. +
    • ALERT! 10043 - Unable to bootstrap WMCore libraries (most likely site python is broken). +
    +
+

    +
  • Executable file related failures - range 50000-59999
      +
    • 50110 - Executable file is not found +
    • 50111 - Executable file has no exe permissions +
    • 50112 - User executable shell file is not found +
    • 50113 - Executable did not get enough arguments +
    • 50114 - OSG $WORKING_DIR could not be deleted +
    • 50115 - cmsRun did not produce a valid job report at runtime (often means cmsRun segfaulted) +
    • 50116 - Could not determine exit code of cmsRun executable at runtime +
    • 50117 - Could not update exit code in job report (a variation of 50115) +
    • 50513 - Failure to run SCRAM setup scripts +
    • 50660 - Application terminated by wrapper because using too much RAM (RSS) +
    • 50661 - Application terminated by wrapper because using too much Virtual Memory (VSIZE) +
    • 50662 - Application terminated by wrapper because using too much disk +
    • 50663 - Application terminated by wrapper because using too much CPU time +
    • 50664 - Application terminated by wrapper because using too much Wall Clock time +
    • 50669 - Application terminated by wrapper for not defined reason +
    • 50700 - Job Wrapper did not produce any usable output file +
    • 50800 - Application segfaulted (likely user code problem) +
    • 50998 - Problem calculating file details (i.e. size, checksum etc) +
    • 50999 - OSG $WORKING_DIR could not be deleted +
    +
+

    +
  • Staging-OUT related troubles- range 60000-69999
      +
    • 60300 - Either OutputSE or OutputSE_DIR not defined +
    • 60301 - Neither zip nor tar exists +
    • 60302 - Output file(s) not found +
    • 60303 - File already exists on the SE +
    • 60304 - Failed to create the summary file (production) +
    • 60305 - Failed to create a zipped archive (production) +
    • 60306 - Failed to copy and register output file +
    • 60307 - Failed to copy an output file to the SE (sometimes caused by timeout issue). +
    • 60308 - An output file was saved to fall back local SE after failing to copy to remote SE +
    • 60309 - Failed to create an output directory in the catalogue +
    • 60310 - Failed to register an output file in the catalogue +
    • ALERT! 60311 - Stage Out Failure in ProdAgent job +
    • 60312 - Failed to get file TURL via lcg-lr command +
    • 60313 - Failed to delete the output from the previous run via lcg-del command +
    • 60314 - Failed to invoke ProdAgent StageOut Script +
    • ALERT! 60315 - ProdAgent StageOut initialisation error (Due to TFC, SITECONF etc) +
    • 60316 - Failed to create a directory on the SE +
    • 60317 - Forced timeout for stuck stage out +
    • 60318 - Internal error in Crab cmscp.py stageout script +
    • 60319 - Failure to do asynchronous stageout +
    • 60401 - Failure to assemble LFN in direct-to-merge by size (WMAgent) +
    • 60402 - Failure to assemble LFN in direct-to-merge by event (WMAgent) +
    • 60403 - Timeout during attempted file transfer - status unknown (WMAgent) +
    • 60404 - Timeout during staging of log archives - status unknown (WMAgent) +
    • 60405 - General failure to stage out log archives (WMAgent) +
    • 60406 - Failure in staging in log files during log collection (WMAgent) +
    • 60407 - Timeout in staging in log files during log collection (WMAgent) +
    • 60408 - Failure to stage out of log files during log collection (WMAgent) +
    • 60409 - Timeout in stage out of log files during log collection (WMAgent) +
    • 60410 - Failure in deleting log files in log collection (WMAgent) +
    • 60411 - Timeout in deleting log files in log collection (WMAgent) +
    • 60451 - Output file lacked adler32 checksum (WMAgent) +
    • 60452 - No run/lumi information in file (WMAgent) +
    • 60999 - SG $WORKING_DIR could not be deleted +
    • 61101 - No sites are available to submit the job because the location of its input(s) do not pass the site whitelist/blacklist restrictions (WMAgent) +
    • 61102 - The job can only run at a site that is currently in Aborted state (WMAgent) +
    • 61103 - The JobSubmitter component could not load the job pickle (WMAgent) +
    • 61104 - The job can run only at a site that is currently in Draining state (WMAgent) +
    • 61300 - The job was killed by the WMAgent, reason is unknown (WMAgent) +
    • 61301 - The job was killed by the WMAgent because the site it was running at was set to Aborted (WMAgent) +
    • 61302 - The job was killed by the WMAgent because the site it was running at was set to Draining (WMAgent) +
    • 61303 - The job was killed by the WMAgent because the site it was running at was set to Down (WMAgent) +
    +
+

    +
  • Problems saving output via output sandbox- range 70000-70009
      +
    • 70000 - Output_sandbox too big for WMS: output can not be retrieved +
    • 70500 - Warning: problem with ModifyJobReport +
    +
+

    +
  • Other problems
      +
    • 99109 - Uncaught exception in WMAgent step executor +
    +
+
+
+ +
Topic revision: r68 - 28 Apr 2014 - JamesLetts
+
+
+
+
 
+
+ +
+
    +
  • +
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMS All webs +
+ +

+

+ + +

    +
  • +
+
This site is powered by the TWiki collaboration platform Powered by PerlCopyright 1999 - 2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
+
+
+
+