Agree or not

Site Reliability Engineering (SRE) > DevOps


A successful enterprise should be a pioneer in the industry. SRE has ended the battle among development and operation. It secures product’s reliability, responsibility and innovation.


What SRE actually relates?

The development team always eager to release innovative ideato the public and expect it has a great noise for feedback and a widecoverage in the market, while the operation team will only secure the product release to the public will not cause any defects and minimize the amount of problems arise. But in fact, the operation and development team are in an opposite side, power struggle among the company may often occur, Operation and development team will always stop at nothing to attempt their objective which causes an unharmonious atmosphere and directly harm company’s interest.

While Site Reliability Engineering can solve the problem relate to development and operation and reduce conjecture and argue among different department, it has a clear division of work. Operation team is responsible for the reliability of the product and secure all products no matter it is newly developed or how its complication is, SRE requires engineers work agile and solve accidental problems as soon as possible and minimize defects occurrence.


SRE is much more than traditional IT operations

In newly launched products, most of the application are unable to achieve a 100% operation without defects and in normal time, SRE teams then set up a service-level agreement which defines the reliability of the new application and have a calculation of defects occurrence. Engineers can freely design the defects occurrence estimation, if the product can run freely and without any defects, they can launch and develop what they want at any time. But, if defects occur frequently or below the formulated service-level agreement, all operation will be held until the number of failures reduced to an acceptable level.


Coding can exist in SRE

In the traditional IT operations, once their development or operation have issues occurred which relate to reliability, it is a severe attack towards the entire development process and require ages in solving and remedy. But it does not apply on SRE. SRE and operation are using the same staffing pool which it ends the struggle between traditional development and operation team and established a new monitoring system at the same time developers can write better code in performance and benefit the while team’s operation instead of specific team.