LIGHT

  • News
  • Docs
  • Community
  • Reddit
  • GitHub
Star

Fail Fast vs Fail Slow

Why microservices need to fail-fast instead of fail-slow

Reserve resources

When an error occurs, whether it is an exception or a validation error or not, the service should stop processing and return that error status to the consumer. This will reserve processing power for the server.

Security concerns

Another reason we don’t output too much info in the response is security concerns. If a hacker tries to explore your service, he/she will construct invalid requests and see what the response error is. If there is too much information in the response, it can dramatically simplify the process. For normal consumer developers, they have centralized logging to help them to identify what is really happening when a simple error code is thrown.

Cascade failure

The more time for the service to process additional steps after an exception, the slower the response will be. Normally, the processing on the exception branches will be significantly slower than the happy path. This will cause the consumer to be waiting and the consumer of the consumer to wait. In a Java EE technology stack, it will cause the thread pool of the consumer to run out easily and the server will receive no response for new requests. The same failure will quickly propagate to the consumer of the consumer quickly and eventually bring down the entire system. That is why it is highly recommended to implement a circuit breaker and bull head.

Why designers keep using fail-slow

There are several wrong assumptions in a designer’s mind when they decide to use fail-slow.

The service will be called by UI

Most services in microservices architecture would be consumed by other services or web server /service aggregators. For these types of consumers, they would want fast-fail and that the response be as simple as possible.

User need to have all the errors in one shot

Most arguments regarding the fail-slow design involve trying to give users all the validation errors so that users can fix all of them and resubmit. This actually comes from the old world of server side rendering which is based on JSP or Servlet. The response time is miserable on these systems and designers have to save every round trip to the server in order to improve user experience and reduce the load to the server as Java EE is blocking and throughput is constrained by the number of threads. With this mentality, the response is getting bigger and bigger and the response time is getting slower and slower.

When an organization is adopting microservices architecture, chances are they will build their UI with a mobile native app or single page app (Angular/React) to talk to the services built. In this type of design, the communication between client and server is based on a small JSON object or even an ProtoBuf binary and the response time is usually within 10 milliseconds. The entire validation design is changed from validate when submitting to validate when typing. Take a look at Google.com when searching: every character you type, there is a request and response between your browser and the google.com server. This goes to extremes as Google has the power and resources to do that. However, for most microservices based solutions, validating when a user moves the cursor from one field to another field on a form is a piece of cake. Here, each field is validated individually; the error message is small and with only one error at a time.

Conclusion

Given the drawbacks of the fail-slow and the suitable use cases are replaced by the SPA. There is no need to design your server to be fail-slow. For microservices architecture, one of the principles is fail-fast.

33

See Also

  • Eco System
  • CQRS
  • Event Sourcing
  • Service Mesh
  • Platform Ecosystem
  • About Light
    • Overview
    • Testimonials
    • What is Light
    • Features
    • Principles
    • Benefits
    • Roadmap
    • Community
    • Articles
    • Videos
    • License
    • Why Light Platform
  • Getting Started
    • Get Started Overview
    • Environment
    • Light Codegen Tool
    • Light Rest 4j
    • Light Tram 4j
    • Light Graphql 4j
    • Light Hybrid 4j
    • Light Eventuate 4j
    • Light Oauth2
    • Light Portal Service
    • Light Proxy Server
    • Light Router Server
    • Light Config Server
    • Light Saga 4j
    • Light Session 4j
    • Webserver
    • Websocket
    • Spring Boot Servlet
  • Architecture
    • Architecture Overview
    • API Category
    • API Gateway
    • Architecture Patterns
    • CQRS
    • Eco System
    • Event Sourcing
    • Fail Fast vs Fail Slow
    • Integration Patterns
    • JavaEE declining
    • Key Distribution
    • Microservices Architecture
    • Microservices Monitoring
    • Microservices Security
    • Microservices Traceability
    • Modular Monolith
    • Platform Ecosystem
    • Plugin Architecture
    • Scalability and Performance
    • Serverless
    • Service Collaboration
    • Service Mesh
    • SOA
    • Spring is bloated
    • Stages of API Adoption
    • Transaction Management
    • Microservices Cross-cutting Concerns Options
    • Service Mesh Plus
    • Service Discovery
  • Design
    • Design Overview
    • Design First vs Code First
    • Desgin Pattern
    • Service Evolution
    • Consumer Contract and Consumer Driven Contract
    • Handling Partial Failure
    • Idempotency
    • Server Life Cycle
    • Environment Segregation
    • Database
    • Decomposition Patterns
    • Http2
    • Test Driven
    • Multi-Tenancy
    • Why check token expiration
    • WebServices to Microservices
  • Cross-Cutting Concerns
    • Concerns Overview
  • API Styles
    • Light-4j for absolute performance
    • Style Overview
    • Distributed session on IMDG
    • Hybrid Serverless Modularized Monolithic
    • Kafka - Event Sourcing and CQRS
    • REST - Representational state transfer
    • Web Server with Light
    • Websocket with Light
    • Spring Boot Integration
    • Single Page Application
    • GraphQL - A query language for your API
    • Light IBM MQ
    • Light AWS Lambda
    • Chaos Monkey
  • Infrastructure Services
    • Service Overview
    • Light Proxy
    • Light Mesh
    • Light Router
    • Light Portal
    • Messaging Infrastructure
    • Centralized Logging
    • COVID-19
    • Light OAuth2
    • Metrics and Alerts
    • Config Server
    • Tokenization
    • Light Controller
  • Tool Chain
    • Tool Chain Overview
  • Utility Library
  • Service Consumer
    • Service Consumer
  • Development
    • Development Overview
  • Deployment
    • Deployment Overview
    • Frontend Backend
    • Linux Service
    • Windows Service
    • Install Eventuate on Windows
    • Secure API
    • Client vs light-router
    • Memory Limit
    • Deploy to Kubernetes
  • Benchmark
    • Benchmark Overview
  • Tutorial
    • Tutorial Overview
  • Troubleshooting
    • Troubleshoot
  • FAQ
    • FAQ Overview
  • Milestones
  • Contribute
    • Contribute to Light
    • Development
    • Documentation
    • Example
    • Tutorial
“Fail Fast vs Fail Slow” was last updated: April 5, 2021: Issue246 (#256) (50b1c10)
Improve this page
  • News
  • Docs
  • Community
  • Reddit
  • GitHub
  • About Light
  • Getting Started
  • Architecture
  • Design
  • Cross-Cutting Concerns
  • API Styles
  • Infrastructure Services
  • Tool Chain
  • Utility Library
  • Service Consumer
  • Development
  • Deployment
  • Benchmark
  • Tutorial
  • Troubleshooting
  • FAQ
  • Milestones
  • Contribute