2018年1月24日 星期三

Data center migration, infrastructure operation and maintenance

Data center migration, operation and maintenance management is a very tedious and complex work, but at present, people who are engaged in this work still have some common knowledge and experience. The top priority of all companies is to recognize the importance of a sustainable operation plan. To achieve sustainable operation, companies must act immediately, evaluate their current operation plans, and start developing operational methodology that can avoid common mistakes.
First big mistake: exclude the data center site infrastructure operation team from the facility design process
 Data centre migration
Can the overall balance of the initial investment and operating costs and the needs of the company have cost (TCO) method, is the first step in creating the most effective, the most economical and efficient data center, including according to the specific situation of the company to determine the design standards of data center and determine its performance characteristics.
According to our experience, if the operation team is excluded from the data center site infrastructure design stage, the result often needs to rectify and maintain the infrastructure after data center delivery. For example, in the case of the following, we have to rectify a new data center.
Not designing enough branch circuits to lead to various maintenance operations.
The design and installation of the generator set is not reasonable, which leads to the difficulty of the simple maintenance work.
As a result of the defect in the building design, the air processing unit is unable to provide the required air flow for the data center.
These errors could have been avoided if the operation plan was taken into account during the design process. When you let the operator participate in the design phase, you will "make a good idea when you design". This is the essence of the TCO method.
Second big mistakes: too much reliance on the design of the data center
Many enterprises believe that it is extremely wrong to think that the design of high redundancy can reduce the input in the operation and maintenance plan. All kinds of research on the downtime in the data center are all the same: human error is the culprit. The correct operation, not the design, can not only maintain the normal operation of the facility, but also control the cost. It can protect the company's investment and protect the company's reputation. Many companies wrongly invest a lot of money in robust and redundant designs, but ignore the appropriate budget for operations.
For example, many enterprises perform the operation of key facilities to the property companies that specializes in maintaining office buildings, and these companies simply don't have the expertise to run or maintain critical facilities.
The typical office space facilities are operated on the basis of the idea that the system can be stopped for maintenance or maintenance. A short office building failure can only bring inconvenience to the internal staff, but if the data center has a serious downtime accident, it may jeopardize the company's mission. The only goal that companies should keep in mind when building data center infrastructure and organizing its operation team is to extend the normal running time to the maximum.  The traditional device maintenance plan does not fully meet the following special functions and needs of the mission critical environment.
Performance - continuous operation is the requirement of the core business.
Availability - the normal running time of 100% does not allow any system shutdown to occur.
System complexity - redundant system, fault automatic transfer, emergency recovery program;
Responsibility system - process documentation, change control and audit records.
The key to meet the above needs is to lay the foundation for the operation of key facilities through a correct methodology.
In order to ensure that these key needs are met, the fully qualified data center site infrastructure operators should be identified at the beginning. Choosing the wrong staff or allowing the operator to come in later in the design will let you miss the opportunity to build an excellent data center.
Third major mistakes: improper staffing
Many companies estimate the configuration requirements of the operation and maintenance personnel of the data center site infrastructure based on the general building management standards. In a data center environment, if the need for staffing is underestimated, there will be a risk of being unattended in an emergency.  The staffing should be based on the risk prediction and budget. The company should take into account the factors such as emergency response, equipment maintenance and supplier management, and set up a timetable to deploy personnel in the best way.
In the same way, it is essential to hire and retain the right talent. It is very challenging to recruit talented people with professional technical knowledge. Companies need to carefully identify future team members, not only to investigate their traditional background, but also to know whether they have qualified technical capabilities, management skills and communication skills. All these skills play a vital role in the operation of key facilities. However, it is only the first step to select only qualified operators.

沒有留言:

張貼留言