Echo Server Down

Updated

Why this happened

We switched to a new server cluster during the Echo server maintenance conducted on February 20. We had prepared the servers based on the past load, but the load balancer that distributes requests to the servers could not withstand the load and could not accept some requests.
In this recovery work, we reverted to the old cluster.

How to prevent this in the future

We will enhance the load balancer's performance so that it can withstand the expected number of requests. In the future, we will work to prevent recurrence by incorporating sufficient load verification into the development flow before release.

原因

2月20日に実施したEchoサーバーのメンテナンスにおいて、新しいサーバー群への切り替えを行いました。過去の負荷状況をもとにサーバーを準備しておりましたが、そのサーバー群に対してリクエストを分散するロードバランサーが負荷に耐えられず一部のリクエストを受け付けられない状況となっていました。
今回の復旧作業においては、ロードバランサーの性能強化を行わずに、旧サーバー群へリクエストが向かうように変更いたしました。

対策

直接的な対策としては、ロードバランサーの性能を強化し、想定されるリクエストに耐えられるようにします。
今後はリリース前に十分な負荷検証を行うことを開発フローに組み込み、再発防止をいたします。

Posted 22 Feb at 12:08am UTC.

Resolved

It was restored at 9:25（JST).

Posted 21 Feb at 12:27am UTC.

Created

From around 8:45am(JST)
The following failures are occurring. We are currently investigating.
Cannot enter and leave meeting room
Cannot update emoji
Cannot change layout

Posted 21 Feb at 12:04am UTC.