Application failures
Temporal handles many types of failures automatically through Durable Execution. Worker crashes, network interruptions, and infrastructure outages are all recovered from without any intervention. But some failures require your application to detect and respond to them. Understanding which failures Temporal handles and which ones your application must handle is fundamental to building reliable Temporal applications.
Platform failures vs application failures
Failures fall into two categories based on where they are detected and mitigated: platform failures and application failures.
Platform failures
Platform failures occur due to issues with the infrastructure: server outages, network interruptions, Worker crashes, or other environmental factors outside of your application's control. Temporal's Durable Execution handles these failures transparently. When a Worker crashes mid-execution, another Worker picks up the work and continues from where it left off. Your application code does not need to account for these failures.
Platform failures are resolved through forward recovery: the system retries the failed operation, and if the retry succeeds, the application continues from the point of failure without undoing any previous work.
Application failures
Application failures are generated by your code. They indicate an issue with your application logic, such as invalid input data, a business rule violation, or a failed call to an external service.
Application failures do not resolve on their own through retries alone. Recovering from an application failure may require fixing a bug, passing different input data, or performing some external mitigation.
Application failures often involve backward recovery: the system undoes some of the work that has already been performed to return to a previous state. For example, if a payment step fails after inventory has already been reserved, the application may need to release that inventory.
For guidance on categorizing failures and deciding how to handle them, see Error handling strategy.
How Temporal represents failures
All failures in Temporal are represented as a Failure in the API. Each SDK exposes failures using the conventions of its language: what is called a Failure in one SDK might be called an Error or Exception in another.
Most SDKs have a base class that other failure types extend. This provides a common interface and shared behavior across different failure types:
- TypeScript: TemporalFailure
- Java: TemporalFailure
- Python: FailureError
- Go: Uses specific error types rather than a base class
Temporal categorizes failures into several types:
| Failure type | Description |
|---|---|
| Application Failure | Raised by your code to indicate application-specific errors. This is the only failure type you create directly. |
| Activity Failure | Wraps an error from an Activity Execution. The cause field contains the underlying error. |
| Child Workflow Failure | Wraps an error from a Child Workflow Execution. |
| Timeout Failure | Occurs when an Activity or Workflow exceeds its configured timeout. |
| Cancelled Failure | Results from cancellation of a Workflow, Activity, or Timer. |
| Terminated Failure | Occurs when a Workflow Execution is forcefully terminated. |
| Server Failure | Originates from the Temporal Service itself. |
Do not extend the base failure class or any of its children in your code. The provided classes are designed to work with Temporal's serialization mechanism, which converts failures to Protocol Buffer messages for communication across process and language boundaries. Custom subclasses can break this serialization and lead to unexpected behavior.
For a complete reference of all failure types and their SDK-specific classes, see Failures reference.
Application Failure
Application Failure is the failure type you use to communicate application-specific errors. It is the only failure type designed to be created and thrown directly by your code.
When you throw an Application Failure, you can set these fields:
- message: A human-readable description of the error.
- type: A string that categorizes the failure (for example,
"InvalidInput"or"InsufficientFunds"). - non_retryable: A flag that prevents the operation from being retried, regardless of the Retry Policy.
- details: Additional data about the failure.
Any non-Temporal error thrown from an Activity is automatically converted to an Application Failure.
During this conversion, the error's type name, message, and call stack are preserved, and non_retryable is set to false.
Failure Converters
When Temporal returns a failure, the default Failure Converter copies error messages and stack traces as plain text. This text is accessible in the Web UI and through the CLI.
If your errors might contain sensitive information, you can encrypt the message and stack trace by configuring a custom Failure Converter with a codec. See Failure Converter for details.
Workflow Task failures vs Workflow Execution failures
When an error occurs in Workflow code, it produces one of two outcomes depending on the error type: a Workflow Task failure or a Workflow Execution failure. Understanding the difference is important because they have very different implications.
Workflow Task failures
A Workflow Task failure occurs when the Workflow code throws an error that does not extend the Temporal base failure class. This includes language-level errors (null reference, division by zero, type errors) and non-determinism errors.
Workflow Task failures are treated as transient problems, typically bugs that can be fixed with a code deployment. Temporal retries them automatically, giving you the opportunity to fix the code and redeploy without losing the state of existing Workflow Executions.
When a Workflow Task failure is retried:
- The Worker removes the Workflow Execution from its cache.
- The Temporal Service schedules a new Workflow Task on the original Task Queue.
- A Worker picks up the Task and replays the Workflow Execution from Event History to restore the correct state before continuing.
Workflow Execution failures
A Workflow Execution failure occurs when the Workflow code throws a Temporal failure, such as an Application Failure. This puts the Workflow Execution into the "Failed" state permanently. No more attempts are made to progress the execution.
Use Workflow Execution failures for permanent business logic failures where retrying the same code with the same input will not produce a different result.
How errors propagate
When an Activity fails, Temporal wraps the error in an Activity Failure before delivering it to the Workflow. The Activity Failure provides context about the failure, including the Activity Type, the number of retry attempts, and the original cause.
The original error is in the cause field.
For example, if an Activity throws an Application Failure with type: "InvalidInput", the Workflow receives an Activity Failure whose cause is that Application Failure.
If an Activity times out instead, the cause is a Timeout Failure.
This wrapping pattern applies to other execution types as well.
A failed Child Workflow delivers a Child Workflow Failure to the parent Workflow, with the original error in the cause field.
If a Temporal failure propagates unhandled through Workflow code, it fails the Workflow Execution. The exception is Cancelled Failure, which puts the Workflow in "Cancelled" state instead of "Failed".
Failures in Event History
Failures are recorded in Event History, which provides a detailed record for debugging.
Activity failures
An Activity Execution that completes results in three Events: ActivityTaskScheduled, ActivityTaskStarted, and ActivityTaskCompleted.
If an Activity fails and the Retry Policy does not cause it to retry, the Temporal Service adds an ActivityTaskFailed Event that contains the error details.
If an Activity times out, an ActivityTaskTimedOut Event is added instead.
While an Activity is running, ActivityTaskScheduled is the most recent Event visible for that Activity.
The ActivityTaskStarted Event is not written until the Activity Task closes, because the final retry attempt number (an attribute of ActivityTaskStarted) is not known until then.
You can view pending Activity Executions in the Web UI's Pending Activities section, which shows the Activity Type, current retry attempt, remaining attempts, and heartbeat information.
Workflow Execution failures
An Activity failure does not directly cause a Workflow Execution failure. If an Activity fails and the error is not handled in the Workflow code (or is intentionally re-raised), the Workflow Execution fails.
When a Workflow Execution fails, the Temporal Service adds a WorkflowExecutionFailed Event.
If the failure was caused by an unhandled Activity error, the activityFailureInfo is attached to that Event.