Enterprise

Boosting Data Center Efficiency with Solidigm SSDs and Liquid-Cooled Servers

Pairing liquid cooling and efficient SSD management offers a path forward for data centers looking to scale performance and storage density.

As data centers strive for greater energy efficiency, particularly with the demands of AI workloads, many are turning to liquid cooling to optimize performance and manage energy consumption. Liquid cooling can efficiently manage the heat generated by high-performance servers, enabling them to operate at peak capacity without the energy-intensive costs associated with traditional air cooling. Solidigm’s high-density SSDs are ideally suited for these environments, offering exceptional terabyte-to-watt efficiency.

While AI forces many data center operators to consider liquid cooling, its impact reaches much further. In a prior report, we examined the effect of liquid cooling on a 2U Dell PowerEdge R760. CoolIT’s direct liquid cooling (DLC) significantly reduced server energy consumption by lowering fan speeds, a power savings of 200 watts. That testing was centered entirely on CPU performance; this time, we wanted to take a more storage-centric look to understand the impact of SSDs on server power consumption.

What are NVMe Active Power States?

NVMe power states are predefined states that an NVMe device can transition into to manage power consumption and performance. The NVMe specification allows for up to 32 power states, each characterized by maximum power consumption, entry latency (ENLAT), exit latency (EXLAT), and relative performance values. These power states are divided into operational and non-operational states. Operational power states, or P-States, allow the device to handle I/O operations. Non-operational states, or F-States, are used when the device is idle and not handling I/O operations.

Managing these power states is crucial for optimizing the power efficiency of NVMe devices, especially in environments where power consumption is a critical concern, such as edge devices and specialized applications like the SSDs on the International Space Station. For instance, the NVMe specification includes features like Autonomous Power State Transition (APST), which allows the device to automatically transition between power states based on current usage and thermal conditions. This helps balance performance with power consumption, ensuring reliable operation in remote or constrained environments. Runtime D3 (RTD3) support allows the device to enter a zero power idle state, further conserving energy when the device is not in use.

NVMe power states are particularly beneficial when power efficiency and thermal management are paramount. In edge devices, for example, the ability to quickly transition to lower power states when idle can significantly reduce energy consumption, which is critical for devices operating in remote or harsh environments with limited power availability. This is achieved through features like PCIe Active State Power Management (ASPM) and low power states such as L1.1 and L1.2, which reduce power consumption to minimal levels. Managing power and thermal output on the ISS is crucial due to the limited and controlled environment. NVMe power states can help in throttling the power consumption of SSDs to manage thermal design power (TDP) and optimize the overall energy budget, ensuring that the SSDs operate efficiently without overheating.

In these specialized environments, NVMe power states provide a flexible and efficient way to manage the power consumption of NVMe devices. By leveraging these states, devices can balance performance and power efficiency, making them suitable for various applications, from edge computing to space missions. The ability to dynamically adjust power states based on real-time conditions ensures that NVMe devices can meet the varying demands of different environments while optimizing for energy efficiency and thermal management.

In addition to NVMe power states, the concept of Composite Temperature and Touch Temperature plays a crucial role in managing the thermal performance of NVMe SSDs in new enterprise SSDs. Touch Temperature represents the external case temperature of the SSD. Solidigm has been a leader in embracing new higher Touch Temperature standards. The factory-set Touch Temperature for Solidigm D5-P5336, for instance, is 80°C. This higher touch temperature limit allows SSDs to be cooled with lower airflow or to operate in higher ambient temperatures. This flexibility enables data centers to optimize cooling strategies and improve overall thermal management, potentially reducing cooling costs and enhancing the reliability and longevity of the SSDs.

Managing NVMe Active Power States

In a Linux testing environment running Ubuntu 22.04, we can use the NVMe toolset to poll the drive to view and change the D5-P5336’s power states. As you can see below, the drive supports states 0,1 and 2, with stage 0 being the least restrictive and stage 2 being the most restrictive.

For the Solidigm 61.44 TB D5-P5336, PS0 is 25W, PS1 is 15W, and PS2 is 10W. The drive idles at about 5.5W, so with each ratcheting up of the power mode, the SSD has less and less power overhead to dedicate to NAND read and write operations. Write operations take the biggest hit, as it uses more power to write to NAND than it does to read from it.

The command to check the current power state on our Solidigm D5-P5336 SSD is shown below. The current value of 00000000 indicates the drive is in PS0, which is the highest 25W mode.

A similar command is issued to change the power state, with the final number representing the power mode you want the SSD to be in. For example, the following command sets the power mode at PS0 on the Soldigim D5-P5336 SSD. If you use power modes 1 or 2, change the—-value= figure to correspond to the correct power mode.

Impact of Power States on Performance

To measure the power impact and performance impact of power states on the Solidigm D5-P5336 61.44TB SSD, we outfitted a Dell PowerEdge R760 with 24 SSDs. Running Ubuntu and the FIO workload generator, we could easily run a consistent workload across all SSDs and update the power mode on the fly.

We used Dell’s onboard power monitoring inside the server’s iDRAC9 onboard management system to monitor power at the system level.

We focused on sequential read and write bandwidth workloads, using a 128K blocksize across each drive, and then measured aggregate performance across all 24 SSDs. It should be noted that this particular Dell PowerEdge R760 configuration with 24 NVMe bays leverages a PCIe switch versus direct-attached NVMe bays. So, the total bandwidth measured saturates the available PCIe switch lanes before hitting the drives. This impacts the total read performance we measured compared to the Soldigim P5536 spec sheet, but the aggregate write speeds were all under that limit.

Total Watts Write Speed Read GB/s Watts
Over Base
Watts/Drive
(with system overhead)
Idle No Drives 462
Idle Drives Installed 594 132 5.5
24x Sequential Read PS0 858 109GB/s 396 16.5
24x Sequential Read PS1 858 105GB/s 396 16.5
24x Sequential Read PS2 759 79.8GB/s 297 12.375
24x Sequential Write PS0 1089 82.5GB/s 627 26.125
24x Sequential Write PS1 825 34.4GB/s 363 15.125
24x Sequential Write PS2 726 17.3GB/s 264 11

Looking back at our article on the benefits of converting an air-cooled platform to direct-liquid cooling, we saw a slight performance increase regarding the CPUs, but we also saved 200W of power. Power is a precious commodity in the new wave of AI-centered servers that frequently dedicate all available resources to GPUs and high-end CPUs. In a data center at or near a power budget limit on air cooling, switching to DLC buys a power budget that allows the server to be filled with more SSDs for the same power footprint as an air-cooled server.

A 200W power savings can go a long way regarding storage density; that savings lets you double the storage footprint from 12 to 24 SSDs in a liquid-cooled server vs. an air-cooled server if you have workloads geared towards read-intensive workloads. With the Solidigm D5-P5336, this 24-bay server has increased storage capacity from 737TB to 1,474 TB thanks to the liquid loop. If the workload is write-heavy, you would be able to equip the server with about eight more SSDs. However, these figures are with the base power modes, so if you are willing to shave some write performance off the top end, you could easily pack your server with 24 SSDs with a write-heavy workload with reduced performance.

Conclusion

Through our testing of the Solidigm D5-P5336 SSDs, we’ve seen how managing NVMe power states can significantly impact power efficiency without dramatically affecting performance. Data center operators looking to maximize energy efficiency can leverage these power states to achieve greater storage density or reduce operational costs, particularly in AI-centric environments where power is at a premium. Solidigm’s high-density SSDs are well-positioned for this, offering excellent terabyte-to-watt efficiency, especially with modern liquid cooling technologies.

Our findings reveal that even slight adjustments to power states can yield significant power savings, which can be crucial in environments limited by power availability. Optimizing servers’ overall power consumption enhances storage density and supports more sustainable data center operations.

Power management becomes increasingly critical as modern servers are pushed to their limits, especially in AI-driven workloads. Pairing liquid cooling and efficient SSD management options offers a path forward for data centers looking to scale performance and storage density without overshooting power budgets.

You can see the full demo of these technologies live at OCP 2024. We’ll showcase how liquid cooling and Solidigm’s SSDs can be the cornerstones of energy efficiency in the modern data center.

Solidigm Storage Solutions

This report is sponsored by Solidigm. All views and opinions expressed in this report are based on our unbiased view of the product(s) under consideration.

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed

Brian Beeler

Brian is located in Cincinnati, Ohio and is the chief analyst and President of StorageReview.com.

Recent Posts

Samsung 990 EVO Plus SSD Review

The Samsung 990 EVO Plus is a solid choice for those looking to upgrade their storage, especially if you need…

14 hours ago

The Ultimate Nextcloud Server Guide

This article demonstrates building a high-performance, customizable cloud storage server using Ubuntu Server 24.04.1 and Nextcloud Hub 9. It requires…

2 days ago

Flexibility and Efficiency: MiTAC TYAN HG68-B8016 Multi-Node Servers

The MiTAC TYAN HG68-B8016 stands out as a highly flexible platform ideally suited for cloud providers offering tailored server configurations.…

2 days ago

High-Performance Storage and AI Driving Animal Conservation at the Zoological Society of London

High-density storage solutions and advanced AI significantly impact the monitoring, protecting, and understanding of animal populations. (more…)

5 days ago

VergeIO: A High Performance VMware Alternative

The polished VergeIO platform elevates it above alternatives like Proxmox, and it's much more cost-effective and flexible than VMware. (more…)

1 week ago

How Amazon WorkSpaces Meet Today’s Corporate Desktop Challenges

While not all DaaS solutions are created equal, our experience with Amazon WorkSpaces demonstrated its reliability and cost-effectiveness. (more…)

1 week ago