Monitoring for Modern Data Center Management

WHITE PAPER
Intel® PTAS-iEN
Data Center Efficiency and Optimization
Data Center Management
Monitoring and Managing the Modern Data Center
Intel® Data Center Management (Intel® DCM) software and Intel® Power Thermal Aware Solution (Intel® PTAS),
combined with Chunghwa Telecom’s Intelligent Energy Network (iEN) data center solution, can reduce the power
consumption of data center cooling equipment by up to 30% and improve overall data center operating efficiency.
With the rising demand for cloud
service, businesses today require
increasingly more compute
capability from their data centers.
This compute increase will most
likely come from higher rack and
room power densities.
(servers, hubs, routers, wiring patch
panels, and other network
appliances), not to mention the
infrastructure needed to keep these
devices alive and protected,
encroaches on another IT goal: to
reduce long-term energy usage.
But an increase in a data center’s
business-critical IT equipment
IT facilities use electric power for
lights, security, backup power, and
climate control to maintain
temperature and humidity levels
that minimize downtime due to heat
issues. By benchmarking power
consumption, you are comparing the
power needed to run businesscritical equipment with the power
needed to maintain that equipment.
Challenges
• Reduce power consumption/costs. How to raise the IT and power density of
data center racks to meet business growth and better service delivery is a top
concern for many cloud service providers and a barrier to some companies’
growth. For efficient and intelligent operation and for improved service, data
centers must maximize available resources while reducing operating costs.
• Cool equipment intelligently and efficiently. Tailoring data center cooling
needs on a case-by-case, server/rack/row basis has not been possible until now.
Solution
• Intel® DCM and Intel® PTAS. The PTAS technology solution provides platform
telemetry data, metrics, and analytics to enable real-time power and thermal
monitoring and reporting, analytics, cooling, and compute control in a data
center. (Early hot spot identification, monitor events in server/rack, advanced
CRAC control strategies.)
• Chunghwa Telecom iEN Service. CHT’s Intelligent Energy Network (iEN) data
center solution decreases power usage and increases operation efficiency by
combining intelligent management of your data center’s infrastructure and
power system (including air conditioning, lighting, security, and environmental
monitoring) in one platform.
Benefits
• Up to 30% energy savings. Decrease energy consumption by reducing
overcooling, while achieving PUE 1.52 to meet LEED standard for green data
center operation.
• Increased reliability of data center operation. Intel® platform telemetry data,
metrics, and analytics solution enables real-time monitoring and energy
efficiency management. Data center administrators get real-time information on
cooling issues and heat distribution alarms.
“Working with Intel on this data center
management proof-of-concept has
shown us that we can deliver real-time
level 3 PUE measurements on a fully
operational data center. And not only can
we monitor our data center’s energy
efficiency through the cloud, we can also
intervene when necessary, redistributing
resources and facility infrastructure
according to need.
“We are optimistic about the feasibility
of integrating real-time individual server
data with a building energy management
system to dynamically manage a facility’s
cooling infrastructure. We look forward
to validating this next step in Chunghwa
Telecom’s long-term plan to improve
data center efficiency management.”
— Ruey-Chang Hsu,
Vice President of Network Department,
Chunghwa Telecom
Monitoring and Managing the Modern Data Center
Benchmarking power usage
Before you can intelligently reduce
your data center’s energy
consumption, you need to know
what its current consumption is.
Monitoring energy consumption is
the first step to managing your data
center’s energy efficiency.
Benchmarking helps you
understand the existing level of
efficiency in your data center.
Power Usage Effectiveness (PUE)
and its reciprocal Data Center
Infrastructure Efficiency (DCIE),
developed by the Green Grid
consortium, are internationally
accepted benchmarking standards
that measure a data center’s power
usage for actual computing
functions (as opposed to power
consumed by lighting, cooling, and
other overhead).
PUE =
Total facility power
IT equipment power
DCiE =
IT equipment power
Total facility power
A data center that operates at
1.5 PUE or lower is considered
efficient (see Table 1).
Table 1 PUE/DCIE efficiency standards a
PUE
DCIE
Efficiency
1.2
83%
Very efficient
1.5
67%
Efficient
2.0
50%
Average
2.5
40%
Inefficient
3.0
33%
Very inefficient
Identifying where power is lost is
key to making your data center run
more efficiently. Three ways to
measure PUE:
• Level 1. Measure at least twice a
month from the data center's UPS
(uninterruptible power supply).
• Level 2. Measure daily at the PDU
(power distribution unit).
• Level 3. Measure continuously
throughout the day, including data
from the PDU and UPS.
Level 3 measurements are the most
accurate and most useful, and they
are easy to acquire with Intel® PTAS.
Intel® PTAS gives you precise
indicators to match supply with
demand, comparing the power
currently used for the IT equipment
with the power used by the
infrastructure that keeps that
equipment cooled, powered, backed
up, and protected. By addressing
inefficiencies at the rack level, you
can optimize by row, and eventually
address the whole data center’s
efficiency, reducing power
consumption and related energy
costs—in both operating and capital
expenses—and thereby extend the
useful life of your data center.
Cooling needs
a. Source: Green Grid
Because heat is a leading cause of
downtime in data centers, rooms
filled with racks of computers and
other heat-producing IT equipment
require a lot of energy to cool. Some
experts claim a data center's
infrastructure may be responsible
for as much as 50% of the DC’s
energy bill—a good portion of that
coming from cooling equipment.
The energy required by this cooling
equipment may come at the
expense of actual compute power.
So reducing the power fed to your
cooling solution may allow greater
utilization of your power resources
for actual business.
The Rack Cooling Index (RCI)*, which
monitors rack overtemperatures
and undertemperatures, is another
useful industry benchmark. In short,
an RCI score of 100% indicates that
temperatures did not exceed the
acceptable highs (or lows). Anything
above 90% is acceptable, and above
96% is considered good. A data
center that maintains 100% for
both highs and lows is in the
“goldilocks zone”—the optimal
operating temperature that is not
too hot or too cold. Generally
speaking, the goldilocks zone for
data centers is between 65 and
80°F (18 and 27°C). Anything cooler
is probably wasted energy, and
anything warmer may result in
equipment failures and more
downtime (Figure 1).
But because IT professionals seldom
have access to real-time controls to
optimize power and temperature,
many will overcool their data
centers to meet peak or “worst
case” conditions. Overcooling a data
center wastes energy, but IT
professionals rarely have the tools
they need to cool their data centers
wisely and economically.
Figure 1 Data center operating temperature ranges
60°F
(15°C)
65°F
(18°C)
Overcooled
(higher costs)
2
80°F
(27°C)
RCI = 100%
(best operation — ”Goldilocks” zone)
90°F
(32°C)
Too hot
(risk of failure)
Monitoring and Managing the Modern Data Center
Intel® PTAS and how it works
Intel® PTAS is Intel’s DCIM solution
with integrated platform telemetry
and analytics to identify and
address energy efficiency issues. It
provides server level power
monitoring through Intel® DCM,
calculates efficiency metrics, and
develops 3D thermal maps of with
PUE Level 3 measurements. It also
works out air conditioning control
strategies and stimulation, logging
events for any abnormal behavior in
a server, rack, or room.
running hot, the iEN-Box, through
controlling devices, could increase
the cooling for that rack location.
One server acts as the Intel® DCM
Server. It gathers information from
the other servers, then sends this
data through API to the iEN-Box,
and both interact with each other in
real-time (Figure 2). For example, if
the Intel® DCM Server were to notify
the iEN-Box that a server was
Then iEN builds thermal maps and
efficiency metrics from Intel® DCM
data. As a result, data center
administrators can identify unused
or idle servers for consolidation,
avoid potential failures before they
occur, and run the data center more
efficiently.
Figure 2 Proof of concept configuration
iEN-DCM configuration
Intel® PTAS
Feature design
Control strategies
Cloud Argus
CHT OSS
QoS
Cloud iEN
Ethernet
iEN web UI and
graphic control
User
POMIS
Internet
Management layer
Video
monitoring
Backup
system
iEN-Box
Master
Intel® PTAS
iEN-Box
Slave
Web service
(API)
Access control
system
Intel® DCM
Intel Server
Controlling
devices
Inlet and outlet
temperature of servers,
power, performance
indicators, air flow, etc.
Controlling
devices
Control layer
Device layer
Power
Results
Intel helped conduct a proof of
concept at Chunghwa Telecom’s
2,427-square-foot data center to
evaluate Intel® DCM and Intel® PTAS
working with CHT’s iEN. The test
involved an internal data center
with separate hot and cool aisles,
using 19 QCT and Intel servers1—
with at least four units in each of
1. Nine Quanta* QCT servers: One Intel® Xeon® E2600 v3 65 W
CPU, one DIMM, and one 2.5 in. HDD.
Ten Intel servers: Two Intel® Xeon® DP E5-2680 130 W
CPUs, one 200 GB SSD, and one 2.5 in. HDD.
A/C
Lighting
Rack
monitor
four dedicated racks within a single
row—to monitor and compare
results with the influence of
different locations. A twentieth
server, acting as the Intel® DCM
server, gathered data on itself and
the other servers. The administrator
could log onto the Intel® DCM server
through the cloud and get real-time
information on each server’s power
usage, air flow, temperature, CPU
utilization, etc. (Figure 3). In this
example, with a CUPS threshold
Fire
suppression
Smoke
detection
Water leak
detection
of 50, the system alerts you when
any unit exceeds 50 CUPS2.
The two-dimensional “floorplan”
view of the data center shows
server thermal distribution in realtime. Clicking on a monitored rack or
on an alarm icon displays a threedimensional representation of the
servers in the rack (Figure 4), with
detailed readings and color-coded
thermal indicators for each server.
2. CUPS = compute usage per second; a measurement of the
amount of “useful” work a server is performing.
3
temperature distribution occurs,
Intel® PTAS provides color-coded
visual warnings, notifies you of the
event (by alert, e-mail, or SMS), and
recommends corrective actions.
With integrated CUPS telemetry and
thermal metrics, Intel® PTAS
balances compute load to correct
thermal events. So if the system
exceeds a threshold, or an uneven
Figure 3 Real-time two-dimensional representation of power usage, air flow, temperature, CPU utilization
2014/08/17 11:52:35
26.0°C
AC9:25.0
49.0%
: Hot Spot (Max Out Temp > 40)
: High Load (Max CUPS > 50)
29.0°C
AC8:25.0
46.0%
Spotlight on Chunghwa Telecom
Chunghwa Telecom (CHT) is
Taiwan’s leading telecom service
provider. The company provides
fixed-line, mobile, and Internet and
data services to residential and
business customers in Taiwan.
Chunghwa Telecom is headquartered
in Taipei, Taiwan. For more information,
visit www.cht.com.tw/en.
Air Control Setting
26.7°C
28.0°C
28.5°C
28.0°C
CUPS
52.0%
CUPS
0.0%
CUPS
0.0%
CUPS
0.0%
J03
J04
J05
J06
37.0°C
33.0°C
32.0°C
32.0°C
Aisle
32.0°C
AC
30.0°C
45.0%
31.0°C
Hot
44.0%
39.0%
AC
26.0°C
30.0°C
51.0%
50.0%
CUPS
Temp (°C)
81~100
T>32
61~80
27<T<32
41~60
18<T<27
21~40
15<T<18
0~20
T<15
For more information on Quanta Cloud
Technology (QCT) products, featuring high
manageability and energy-efficient rackmount servers powered by Intel® PTAS
technology, visit www.QuantaQCT.com.
Figure 4 Three-dimensional representation of a specific rack
Rack_J03
PlanMap
QCT
Temp (°C)
81~100
T>32
61~80
27<T<32
41~60
18<T<27
21~40
15<T<18
QCT Server 2 Info
Airflow :
63.7 CFM CUPS :
100.0%
Avg Power : 105.0 W CPU Power : 62.0 W
28.0°C
25.0°C
Intel
Intel
QCT
Intel
QCT
CUPS
30.0°C
27.0°C
QCT
QCT Server 1 Info
98.0%
Airflow :
65.5 CFM CUPS :
Avg Power : 111.0 W CPU Power : 59.0 W
30.0°C
37.0
°C
28.0
°C
36.0
°C
29.0
°C
27.0
°C
27.0
°C
27.0
°C
26.0
°C
27.0
°C
Intel Server 3 Info
Airflow :
30.0 CFM CUPS :
10.0%
Avg Power : 88.0 W
CPU Power : 15.0 W
Intel Server 2 Info
Airflow :
54.0 CFM CUPS :
48.0%
Avg Power : 279.0 W CPU Power : 142.0 W
QCT Server 3 Info
Airflow :
55.0 CFM CUPS :
5.0%
Avg Power : 56.0 W
CPU Power : 12.0 W
CUPS
110
88
98
100
50
44
22
10
5
0
0
0~20
T<15
QCT 01
Intel 03
QCT 02
QCT 03
Intel 02
Summary
Our tests showed that integrating
CUPS data with thermal readings to
balance server loads, using supplyside optimization (which allowed us
to raise ambient temperatures), and
using server sensors to control
cooling equipment reduced cooling
needs and resulted in savings.
An administrator can choose the
hottest, coldest, or average
QCT 04
Intel 01
Find a business solution that is right for
your company. Contact your Intel
representative or visit the Reference
Room at intel.com/references.
Intel Server 1 Info
Airflow :
54.0 CFM CUPS :
50.0%
Avg Power : 270.0 W CPU Power : 50.0 W
66
48
Learn how the Intel® DCM SDK can help
you address real-time power and thermal
monitoring issues in your data center at
software.intel.com/datacentermanager.
QCT Server 4 Info
Airflow :
55.7 CFM CUPS :
0.0%
Avg Power : 70.0 W
CPU Power : 10.0 W
temperature to trigger cooling
remedies. We found that using the
hottest temperature as the control
point (and switching from returnside to supply-side monitoring)
improved AC efficiency by 10 to 15%.
Integrating Intel® PTAS, Intel® DCM
Energy Director with Intel® Node
Manager, and Chunghwa Telecom's
iEN data center solution improved
operating efficiency and reduced
power consumption by up to 30%.
Intel does not control or audit the design or implementation of third-party
benchmark data or websites referenced in this document. Intel
encourages all of its customers to visit the referenced websites or others
where similar performance benchmark data is reported and confirm
whether the referenced benchmark data is accurate and reflects
performance of systems available for purchase.
Copyright © 2014 Intel Corporation. All rights reserved.
Intel and the Intel logo are trademarks or registered trademarks of Intel
Corporation in the United States and other countries.
The Rack Cooling Index (RCI) is a registered trademark of ANCIS Incorporated.
*Other names and brands may be claimed as the property of others.
330247-003EN