Search

link to homepage

Institute for Advanced Simulation (IAS)

Navigation and service


Storage Cluster for Spectrum Scale Fileserver

JUST front viewCopyright: FZ Jülich

JUST3 DS5300 cabling at rearCopyright: FZ Jülich

JUST4 DS3512 with SAS-DisksCopyright: FZ Jülich

JUST Storage EnclosureCopyright: FZ Jülich

10TB nearline SAS diskCopyright: FZ Jülich

The configuration of the Jülich Storage Cluster (JUST) is continuously under movement and expansion to integrate newly available storage technology in order to fulfill the evergrowing capacity and I/O bandwidth demands of the data-intense simulations and learning applications on the supercomputers. Currently the 5th generation of JUST consists of 22 Lenovo DSS systems (Lenovo Distributed Storage Solution) and three older IBM GSS systems (GPFS Storage Server). The software layer of the storage cluster is based on the Spectrum Scale (GPFS) from IBM. JUST and JUST-DATA provide in total a gross capacity of more than 130 PB.

For details see

JUST Numbers

JUST-DSSJUST-DATAJUST-TSMJUSTCOMJUST Total
Capacity

75 PB gross

ca. 50 PB net

51.2 PB gross

ca. 40 PB net

4.2 PB gross,
2.9 PB net
2.1 PB gross,
1.5 PB net
81.3 PB gross,
54.4 PB net
Racks16102129
Server44 + 5 Mngt + 2 CES10 + 24 CES (virtualized)4 + 1 Mngt2 + 1 Mngt93
Disk Enclosures9060126168
Disks (*)7516 + 44 SSD5040696 + 4 SSD348 + 2 SSD13650

JUST Architecture

JUST physical viewCopyright: FZ Jülich

JUST - central HPC storage infrastructureCopyright: FZ Jülich


JUST Hardware Characteristics

 

JUST: Distributed Storage Solution
  • 1 x DSS-G 26 (10 TB)

    • each 2 x Lenovo x3650 M5 Systems (x-Series)

      • each 2 x Intel Xeon Processors E5-2690, 14 cores, 2.66 GHz, 384 GB Memory
      • each 3 x Quad-Port SAS 12Gb HBA
      • each 2 x Mellanox Dual-Port SFP + 100 Gigabit Ethernet Adapter
      • software: RedHat 7.4, DSSG-2.0 (GPFS 5.0.X)
    • each 6 x DSS-Storage (JBODs)

      • each 2 x drawers with 42 slots
      • each 84 x 10 TB NL-SAS Disks (GPFS Native RAID)
      • 1 DSS-Storage with 2 x 400 GB SSD (GPFS-GNR Configuration and Logging)
    • each 502 NL-SAS Disks and 2 SSDs
    • each 5 PB gross, 3.6 PB net (8+3P)
    • JUST User Data and Metadata ($ARCH)
  • 18 x DSS-G 24 (10 TB)

    • each 2 x Lenovo x3650 M5 Systems (x-Series)

      • each 2 x Intel Xeon Processors E5-2690, 14 cores, 2.66 GHz, 384 GB Memory
      • each 3 x Quad-Port SAS 12Gb HBA
      • each 2 x Mellanox Dual-Port SFP + 100 Gigabit Ethernet Adapter
      • software: RedHat 7.4, DSSG-2.0 (GPFS 5.0.X)
    • each 4 x DSS-Storage (JBODs)

      • each 2 x drawers with 42 slots
      • each 84 x 10 TB NL-SAS Disks (GPFS Native RAID)
      • 1 DSS-Storage with 2 x 400 GB SSD (GPFS-GNR Configuration and Logging)
    • each 334 NL-SAS Disks and 2 SSDs
    • each 3.3 PB gross, 2.4 PB net (8+3P)
    • JUST User Data and Metadata ($SCRATCH and $FASTDATA)
  • 3 x DSS-G 24 (10TB)

    • each 2 x Lenovo x3650 M5 Systems (x-Series)

      • each 2 x Intel Xeon Processors E5-2690, 14 cores, 2.66 GHz, 384 GB Memory
      • each 3 x Quad-Port SAS 12Gb HBA
      • each 2 x Mellanox Dual-Port SFP + 100 Gigabit Ethernet Adapter
      • software: RedHat 7.4, DSSG-2.0 (GPFS 5.0.X)
    • each 4 x GSS-Storage (JBODs)

      • each 2 x drawers with 42 slots
      • each 84 x 10 TB NL-SAS Disks (GPFS Native RAID)
      • 1 GSS-Storage with 2 x 200 GB SSD (GPFS-GNR Configuration and Logging)
    • each 334 NL-SAS Disks and 2 SSDs
    • each 3.3 PB gross, 2.4 PB net (8+3P)
    • JUST User Data and Metadata ($HOME)
  • 2 x Lenovo GSS-26 (6 TB)

    • each 2 x IBM x3650 M4 HD Systems (x-Series)

      • each 2 x Intel Xeon Processors E5-2670, 10 cores, 2.5 GHz, 256 GB Memory
      • each 3 x LSI 9201-16e Quad-Port SAS 6Gb HBA (12x)
      • each 3 x Mellanox Dual-Port SFP + 10 Gigabit Ethernet Adapter (6x)
      • software: RedHat 6.5, GSS 2.0 (GPFS 4.1.x)
    • each 6 x GSS-Storage (JBODs)

      • each 5 x drawers with 12 slots
      • each 58 x 6 TB NL-SAS Disks (RAID6 (8+2))
      • 1 DSS-Storage with 2 x 400 GB SSD
    • each 348 NL-SAS Disks and 2 SSDs (GSS-Configuration Backup)
    • each 2088 TB gross, 1668 TB net (RAID6)
    • TSM and User Community Data (Dep. FSD)
  • 1 x IBM ESS GL6 (6 TB)

    • each 2 x IBM Power Server (8247-22L)

      • each 2 x 10 core 3.42 GHz POWER8 Processor Card, 256GB Memory
      • each 3 x LSI 9201-16e Quad-Port SAS 6Gb HBA (12x)
      • each 3 x Mellanox Dual-Port SFP + 10 Gigabit Ethernet Adapter (6x)
      • software: RedHat 6.5, GSS 2.0 (GPFS 4.1.x)
    • each 348 NL-SAS Disks and 2 SSDs (GSS-Configuration Backup)
    • each 2088 TB gross, 1668 TB net (RAID6)
    • TSM
JUST: Server
  • 2 x Management Server (ThinkSystem SR650)

    • each 2 x Intel Skylake Processors Gold 6142, 16 cores, 2.6 GHz, 384 GB Memory
    • each 2 x Mellanox ConnectX-4 Dual-Port 100 Gigabit Ethernet Adapter
    • software: RedHat 7.4, xCAT 2.13
  • 1 x Monitoring

    • each 2 x Intel Skylake Processors Gold 6142, 16 cores, 2.6 GHz, 384 GB Memory
    • each 2 x Mellanox ConnectX-4 Dual-Port 100 Gigabit Ethernet Adapter
    • software: RedHat 7.4, Check_MK 1.4.0
  • 5 x GPFS Management Server (ThinkSystem SR650)

    • each 2 x Intel Skylake Processors Gold 6142, 16 cores, 2.6 GHz, 384 GB Memory
    • each 2 x Mellanox ConnectX-4 Dual-Port 100 Gigabit Ethernet Adapter
    • Software: RedHat 7.4, GPFS 5.0.1
  • 2 x GPFS-CES (Cluster Export Service) (IBM Power6 520 System)

    • each 2 x Intel Skylake Processors Gold 6142, 16 cores, 2.6 GHz, 384 GB Memory
    • each 2 x Mellanox ConnectX-4 Dual-Port 100 Gigabit Ethernet Adapter
    • software: RedHat 7.4, GPFS 5.0.1, NFS (Ganesha)
JUST-DATA: Extended Capacity Storage TIER for community data sharing
  • 4 x GPFS building block

    • each 2 x Lenovo ThinkSystem SR650 NSD Server

      • each 2 x Intel Xeon Gold 6142, 16 cores, 2.6 GHz, 384 GB Memory
      • each 4 x Quad-Port SAS 12Gb HBA
      • each 2 x Mellanox Dual-Port SFP + 100 Gigabit Ethernet Adapter
      • software: RedHat 7.4, GPFS 5.0.1
    • each 4 x DS6200 storage system

      • each 3 x D3284 Enclosures
      • each 252 x 10 TB NL-SAS Disks
    • each 10 PB gross
    • User Data and Metadata ($DATA)
  • 1 x GPFS building block

    • each 2 x Lenovo ThinkSystem SR650 NSD Server

      • each 2 x Intel Xeon Gold 6142, 16 cores, 2.6 GHz, 384 GB Memory
      • each 4 x Quad-Port SAS 12Gb HBA
      • each 2 x Mellanox Dual-Port SFP + 100 Gigabit Ethernet Adapter
      • software: RedHat 7.4, GPFS 5.0.1
    • each 4 x DS6200 storage system

      • each 3 x D3284 Enclosures
      • each 252 x 12 TB NL-SAS Disks
    • each 12 PB gross
    • User Data and Metadata ($DATA)
  • 8 x IBM Power S822 for GPFS Cluster Export Service (CES)

    • each 2 x Power8 Processor, 12 cores, 3,026 GHz, 512 GB Memory
    • each 3 x Dual-Port 100Gigabit Ethernet
    • each 3 x LPAR to run virtual node
    • software: RedHat 7.4, GPFS 5.0.2, NFS (Ganesha)
JUST-TSM: Server and Storage
  • 8 x TSM-Server (IBM Power System S822)

    • each 2 x Power8 Processor, 10 cores, 3.42 GHz, 256 GB Memory
    • each 4 x 16Gbps Dual-Port FC Adapter
    • each 2 x Mellanox Dual-Port 100 Gigabit Ethernet Adapter
    • software: AIX 7.2, Spectrum Protect (TSM) Server + Client, Spectrum Scale (GPFS)
  • 1 x NIM Server (IBM Power System S821)

    • 1 x Power8 Processor, 4 core, 3.0 GHz, 32 GB Memory
    • 1 x Quad-Port 10/1 Gigabit Ethernet Adapter
    • software: AIX 7.1, NIM, TSM Client
  • 1 x Hardware Management Console (IBM Power System 7)

    • Power Systems Management
    • software: Linux, HMC 8.7.0

JUST History and Roadmap

JUST-Roadmap (Capacity & Bandwidth)

In 2007 JUST started with classical storage building blocks consisting of IBM Power5 servers running AIX and storage controllers with FC and SATA disks like IBM DS4800, DS4700, and DCS9550 and 1 PB gross capacity with a total bandwidth of 6-7 GB/s.

The next milestones were in 2009 starting in March with the replacement of the servers by Power6 systems and in December followed by migration to new generation of storage controllers and disks with IBM DS5300. The capacity grew to 5 PB gross and the bandwidth was about 33 GB/s.

In 2012 additional IBM x-Series servers running Linux and IBM DS3512 and DCS3700 storage controllers with SAS and NL-SAS disks were installed and all data beside the fast scratch file system were migrated to the new technology. The free Power6 servers and storage were added to the scratch file system pushing the bandwidth to 66 GB/s and increasing overall capacity to 10 PB.

In January 2013 the installation and test of about 9 PB gross GSS-24 systems running the pre-GA GSS 1.0 version (with the new GPFS Native RAID feature) started. Mid September 2013 a new generally available fast scratch file system was introduced. At the same time a new special file system dedicated to selected large projects with big data demands was made available. The overall JUST storage capacity was 13 PB and a bandwidth of 160 GB/s could be achieved.

In June 2014 additional 2.8 PB (gross) GSS storage was installed and used for migration of the classical $HOME file systems into GNR based file systems. The JUST storage capacity grows to about 16 PB (gross).
In December 2014 it was decided to transfer the remaining classical storage components to GSS-24 systems by reusing the storage infrastucture combined with new x-Series servers. This was done step by step and finished in March 2015. At the end free storage was added to the fast scratch and big data file system increases the bandwidth to about 200 GB/s. At that time JUST consisted of 31 GPFS Storage Server systems (GSS) with a capacity of 16 PB gross.

In June 2015 a global I/O reconfiguration took place to support the new HPC-system JURECA. In all storage servers the 2 times 30 GB Ethernet channels were spitted into 3 times 20 GB Ethernet channels which were distributed over three I/O switches. This implied also recabling. Mid 2015 additional 4 PB (gross) were installed by two capacity optimized GSS-26 storage servers. They were partially used for migration of the HPC archive file systems. The thereby freed storage was added to the fast scratch and big data file systems which increased their capacity by 25% and the I/O bandwidth to 220 GB/s .The overall capacity was 20 PB gross.

In April 2018 the 5th generation of JUST started the production. The old GSS hardware was replaced by new Lenovo Distributed Storage Solution (DSS) systems. The software setup is the same as in JUST4: The parallel file system is based on Spectrum Scale (GPFS) in combination with the GPFS Native RAID (GNR) technology from IBM. 75 PB gross capacity is provided by this new installation.

Two months later the storage cluster JUST-DATA started production which realized a large disk based capacity (40 PB gross) and a moderate bandwidth of 20 GB/s. To match the growing data requirements yearly 12-28 PB will be added. In January 2019 we installed additional 12 PB.


Servicemeu

Homepage