Fault Injection

Fault Injection using SMI in Linkerd

Application failure injection is a form of chaos engineering where we artificially increase the error rate of certain services in a microservice application to see what impact that has on the system as a whole. Traditionally, you would need to add some kind of failure injection library into your service code in order to do application failure injection. Thankfully, the service mesh gives us a way to inject application failures without needing to modify or rebuild our services at all.

Using SMI Traffic Split API to inject errors

We can easily inject application failures by using the Traffic Split API of the Service Mesh Interface. This allows us to do failure injection in a way that is implementation agnostic and works across service meshes.

We will do this first by deploying a new service which only return errored responses. We will be using a simple NGINX service which has configured to only return HTTP 500 responses.

We will then create a traffic split which would redirect the service mesh to send a sample percentage of traffic to the error service instead, let's say 20% of service's traffic to error, then we would have injected an artificial 20% error rate in service.

Deploy Linkerd Books Application

We will be deploying Linkerd Books application for this part of the demo

Use meshery to deploy the bookinfo application :

  • In Meshery, navigate to the Linkerd adapter's management page from the left nav menu.
  • On the Linkerd adapter's management page, please enter default in the Namespace field.
  • Then, click the (+) icon on the Sample Application card and select Books Application from the list.

Inject linkerd into sample application using

In the following, one of the service has already beeen configured with the error let's remove the error rate from the same :

Remove the lines

Now if you will see linkerd stat, the success rate would be 100%

Create the errored service

Now we will create our error service, we have NGINX pre-configured to only respond with HTTP 500 status code

After deploying the above errored service, we will create a traffic split resource which will be responsible to direct 20% of the book service to the error.

You can now see an 20% error rate for calls from webapp to books

You can also see the error on the web browser

If you refresh page few times, you will see Internal Server Error.

Cleanup

  • Remove the book info application from the Meshery Dashboard by clicking on the trash icon in the sample application card on the linkerd adapters' page.

NEXT CHAPTER

Getting Started

Layer5, the cloud native management company