General information on the EDGeS Bridge

  

EGEE  Desktop Grid Bridge

On this page general information on how the EGEE ⇒ Desktop Grid bridge works at a conceptual level is collected with the aim to help understanding this bridge. This may be useful if you are installing any component connecting to or using EDGeS technology or even if you are a user of a VO with such bridges.

Components of the EGEE ⇒ Desktop Grid bridge

The EGEE ⇒ DG bridge has several interworking components:

  • It has a gLite side (represented by the blue diamond) which is a modified lcg-CE that instead of the usual LRMS packages (PBS, LSF or Condor installed on a local cluster) uses an EDGeS jobmanager which sends bridged jobs to a remote 3G Bridge via the 3G Bridge wsclient command. It also comes with a YAIM module so it can be configured like other gLite components. The remote 3G Bridge is specified in the CE configuration for each queue.
  • The Desktop Grid side (represented by the yellow diamond) is a 3G Bridge instance with the appropriate desktop grid output plugin and WSSubmitter input, deployed on a BOINC or XtremWeb-HEP desktop grid.
  • The third component involved in EGEE ⇒ DG bridging is the EDGeS Application Repository (AR) which holds information about applications that can be bridged.

Function of the bridge components

The EDGeS bridge CE acts like a Computing Element in gLite and can accept jobs from the WMS. If a job is submitted to one of its queues the bridge CE first performs the following checks:

  1. If the VO indicated by the user's proxy is allowed to use this queue according to the CE configuration
  2. If the application is present in the AR (the MD5 hash of the executable specified in the JDL is used to identify the application)
  3. If it is allowed for this application to be bridged via this CE to the desktop grid served by the queue according to the AR
  4. If all additional application files specified in the AR are present among the input files of the job and if they match those listed in the AR (again MD5 hashes are used for verification)

If any of these checks fail the job is rejected with an appropriate error that is logged in the LB and will appear in the output of glite-wms-job-status and glite-wms-job-logging-info. If all of the above checks are successful a job is submitted to the 3G Bridge of the connected desktop grid for the verified application. The executable and verified application files from the JDL are not sent to the desktop grid but the desktop grid version of the application already preinstalled on the DG is used to process the input files specified in the JDL for the application. Thus, only additional input files apart from application files are downloaded and passed to the desktop grid.

Where possible (in case of sandbox files and files on Storage Elements that are accessible via gsiftp), first the MD5 hash of the file is retrieved (this is all what is needed for verifying application files so these files are not downloaded afterwards) then, if the file is an input file that is to be passed to the desktop grid it is downloaded and put in a content based cache indexed by MD5 hash. Subsequent jobs can reuse files from this cache thus, if multiple jobs are submitted at the same time, the shared input files are downloaded to the bridge CE only once. This cache is cleaned by a cron job which by default removes files after they were unused for an hour to keep disk space usage low. The information retrieved from the ARCache for the checks is also second-level cached on the CE with a default TTL of one hour (entries are refreshed if they are older than an hour) and it is cleaned by a similar cron job which removes stale entries after one day (as the information in the AR changes less frequently and does not occupy a lot of disk space).

The 3G Bridge with WSSubmitter on the target desktop grid accepts jobs from the bridge CEs which are converted appropriately by the right dektop grid output plugin and corresponding jobs are created in the desktop grid. The jobs are then tracked and the output is made available for download after the job is finished. Managing the jobs and interfacing with the desktop grid is done by the 3g-bridge daemon while the wssubmitter daemon provides a web service interface to submit, track and control jobs.

Transferring input and output files between the bridge CE and the 3G Bridge server is done via http. The bridge CE puts input files in a directory that is exported via a lighttpd server and sends the URLs to the WSSubmitter which retrieves the files before submitting the job to the 3G Bridge. Similarly, the 3G Bridge puts the output files in a directory accessible via http and after the job is finished the bridge CE retrieves the files from here and uploads them to the output locations specified in the JDL.

The bridge CE is responsible for deleting the jobs from the 3G Bridge after downloading the output files or in case the job was canceled while it was still running.

 

The Desktop Grid ⇒ EGEE bridge

Here we provide information on how the Desktop Grid ⇒ EGEE bridge works at a conceptual level is collected with the aim to help understanding this bridge. This may be useful if you are managing or using a desktop grid connected to EGEE by such a bridge or administering an EGEE resource that accepts bridged jobs from desktop grids.

There are two kinds of DG ⇒ EGEE bridges:

  1. BOINC ⇒ EGEE bridge
  2. XtremWebHEP-E ⇒ EGEE bridge

that are based on different approaches and thus work differently. In the EDGeS SA1 service infrastructure both of these bridges use the VO named desktopgrid.vo.edges-grid.eu which can thus support jobs from either BOINC or XWHEP.

The BOINC ⇒ EGEE bridge

The BOINC ⇒ EGEE bridge has two sides:

  • It has a BOINC side (represented by the yellow diamond) which is a modified BOINC Core Client that, instead of running the downloaded WUs locally, writes the contents of the WU to a jobwrapper config file and launches a jobwrapper process in place of the executable specified in the WU. The jobwrapper creates an archive of the slot directory and generates a shell script which extracts the archive, runs the executable specified in the WU and finally creates an output archive. Then it submits this script to a 3G Bridge queue on the same machine with the archived WU as its input. Then the jobwrapper stays running and tracks the execution via polling the 3G Bridge periodically and passing the results back to the BOINC client after the job has finished or if errors are encountered the jobwrapper exits with an error letting the BOINC client know about it.
  • The EGEE side (represented by the blue diamond) is a 3G Bridge instance with the EGEE output plugin that submits jobs to an EGEE VO. The VO, WMS and the DN of the user used for job submission are specified in the plugin configuration. This is because in BOINC typically only the project administrator can install applications thus, the DN of the project administrator can be specified in the 3G Bridge plugin configuration. The proxies are retrieved from a MyProxy server by the 3G Bridge plugin. Also specified in the plugin configuration is a gsiftp accessible Storage Element which is used to transfer WU archives and output archives. This is done to avoid putting these files in the input and output sandbox that would generate too much load on the WMS.

The same 3G Bridge can have multiple instances of the EGEE plugin that are configured differently and handling different queues where jobs are submitted by different jobwrapper clients. Thus, a single BOINC ⇒ EGEE bridge can connect several desktop grids and EGEE VOs.

The XtremWebHEP-E ⇒ EGEE bridge

The XtremWebHEP-E ⇒ EGEE bridge works similar to the pilot job solutions used by some EGEE users. The bridge is deployed on an XtremWebHEP-E Server where it checks if there are jobs that can be bridged to EGEE and submits XWHEP Workers to EGEE on demand. The submitted Workers then connect back to the Server and start processing jobs as other Workers. The Workers submitted by the bridge will exit as soon as the processing of jobs has finished to avoid occupation of resources. Users of XtremWebHEP-E who want their jobs to be bridged need to make sure to submit jobs with their proxy to enable bridging.