Replacement of Cables in the JURECA Booster Module in Autumn

In autumn 2017, the JURECA Booster Module was installed with 1,640 compute nodes based on Intel Xeon Phi processors and an Intel Omni-Path Architecture (OPA) high-speed interconnect. The OPA interconnect links over longer distances are realized with active optical cables. Since the start of the system, this cable type has exhibited a high failure rate that did not decrease over the last few months as anticipated. This problem is now understood to be a quality problem in the cables supply chain. In the meantime, Intel has stopped shipping cables from this supplier.

Unfortunately, only a swift replacement of all optical cables can help to bring the failure rates to an acceptable level in the short term. These cables cannot be replaced during production without a significant risk of impacting on running workloads and the file system stability on the Booster and other systems in the JSC facility. Therefore, multi-week offline maintenance is required for this action. JSC and Intel had initially planned to perform the replacement in early autumn, starting from 7 September until mid-October. However, the new cables that were shipped to Jülich at the end of August did not pass on-site screening. In view of the very significant impact of the planned maintenance on JSC and its users, and considering the risk that the functionality of the new optical cables may be affected, it was decided to postpone the cable replacement.

JSC expects the maintenance to take place in October or November. The precise dates will be announced to all affected users as early as possible. JSC remains committed to keeping the impact of this long maintenance on its user base as low as possible.

Contact: Dr. Dorian Krause, d.krause@fz-juelich.de

from JSC News No. 260, 17 September 2018

Last Modified: 26.02.2022