A seemingly robust SwiftUI application often meets its true test when network conditions deteriorate. Suppose we’re building a macOS data analysis tool that relies on a set of RESTful APIs to fetch and process data. The initial network layer implementation might be quite straightforward:
// MARK: - UnstableApiService.swift (Initial Version)
import Foundation
import Combine
// A typical but non-resilient API service layer
final class UnstableApiService {
private let baseURL = URL(string: "http://localhost:8080")!
enum ApiError: LocalizedError {
case networkError(URLError)
case serverError(statusCode: Int)
case decodingError(Error)
case unknown
var errorDescription: String? {
switch self {
case .networkError(let urlError):
return "Network unavailable: \(urlError.localizedDescription)"
case .serverError(let statusCode):
return "Server error with status code: \(statusCode)"
case .decodingError:
return "Failed to decode the response."
default:
return "An unknown error occurred."
}
}
}
func fetchData(endpoint: String) -> AnyPublisher<Data, ApiError> {
let url = baseURL.appendingPathComponent(endpoint)
return URLSession.shared.dataTaskPublisher(for: url)
.tryMap { data, response in
guard let httpResponse = response as? HTTPURLResponse else {
throw ApiError.unknown
}
guard (200...299).contains(httpResponse.statusCode) else {
// The backend service returns an error directly
throw ApiError.serverError(statusCode: httpResponse.statusCode)
}
return data
}
.mapError { error -> ApiError in
if let urlError = error as? URLError {
// Network-level errors, like timeouts or connection failures
return .networkError(urlError)
} else if let apiError = error as? ApiError {
return apiError
} else {
return .decodingError(error)
}
}
.eraseToAnyPublisher()
}
}
In the ViewModel, we would call it like this, handling loading and error states:
// MARK: - DataViewModel.swift (Coupled with network fragility)
import Foundation
import Combine
@MainActor
final class DataViewModel: ObservableObject {
@Published var content: String = "Fetching data..."
@Published var hasError: Bool = false
@Published var errorMessage: String = ""
private let apiService = UnstableApiService()
private var cancellables = Set<AnyCancellable>()
func loadData() {
self.content = "Fetching..."
self.hasError = false
// Every click initiates a new request, regardless of the backend's state
apiService.fetchData(endpoint: "/unstable-data")
.receive(on: DispatchQueue.main)
.sink(receiveCompletion: { [weak self] completion in
if case .failure(let error) = completion {
self?.hasError = true
self?.errorMessage = error.localizedDescription
self?.content = "Failed to load data."
}
}, receiveValue: { [weak self] data in
self?.content = String(data: data, encoding: .utf8) ?? "Could not decode string."
})
.store(in: &cancellables)
}
}
This code works perfectly under ideal network conditions. But in a real-world project, a backend microservice might start frequently returning 500 errors or timing out due to high load, deployment issues, or downstream dependency failures. What happens to our macOS app then? Every time the user clicks the refresh button, the app stubbornly sends a request to the ailing service, waits for a long timeout (60 seconds by default for URLSession), and finally displays an error in the UI. This experience is disastrous. The user perceives the app as sluggish and unresponsive.
Worse, this barrage of retry requests can exacerbate the backend service’s collapse, creating a “retry storm.”
The traditional client-side solution is to implement more complex logic within ApiService: exponential backoff retries, request queuing, or even a rudimentary circuit breaker. But this quickly bloats the networking code, complicates state management, and is difficult to test. Business logic and network resilience logic become tightly coupled. The pitfall here is that we’ve allowed a pure network infrastructure problem to leak into the application layer code.
A superior approach is to decouple network resilience policies from the application code and delegate them to a specialized component. In backend microservice architectures, service mesh sidecar proxies like Envoy or Linkerd are designed for precisely this purpose. We can absolutely apply this pattern to the client-side, especially in a controlled environment like macOS.
Our architectural vision is this: run a local Envoy instance as a proxy. All outbound API requests from the SwiftUI app are sent to the local Envoy, which is responsible for forwarding them to the actual backend service. When Envoy detects consecutive failures from the backend, it will automatically “trip the circuit”—for a period, all subsequent requests to that service will be immediately rejected by Envoy instead of being sent over the network. This achieves “fail-fast” behavior, protects the backend, and dramatically improves the user experience.
graph TD
A[SwiftUI App] -- API Request --> B{Local Envoy Proxy
localhost:10000};
B -- Forwards Request --> C[Flaky RESTful API
localhost:8080];
subgraph "Circuit Breaker Logic (Outlier Detection)"
B
end
C -- 5xx Error / Timeout --> B;
B -- Monitors Failures --> B;
B -- Trips the breaker --> B;
A -- Subsequent API Request --> B;
B -- Immediately returns 503 --> A;
Choosing Envoy over implementing a circuit breaker library in Swift is based on several pragmatic considerations:
- Separation of Concerns: SwiftUI code only needs to care about business logic. Network robustness is declaratively defined in Envoy’s YAML configuration. The application code requires almost no changes.
- Language Agnostic: Resilience policies are decoupled from the programming language. If the team has other client tools written in different languages, they can reuse the same Envoy configuration.
- Production-Grade Reliability: Envoy is battle-tested in large-scale production environments. Its circuit breaker feature (known as “Outlier Detection” in Envoy) is far more mature and powerful than anything we could implement ourselves.
- Dynamic Configuration: Envoy’s configuration can be updated dynamically without recompiling and redeploying the client application.
Next, we’ll implement this architecture step-by-step.
Step 1: Simulate an Unstable Backend Service
To validate our solution, we first need a backend service with controllable behavior. We’ll use Node.js and Express to create a simple API that can be configured to return 503 errors for a certain percentage of requests.
// MARK: - flaky-server.js
const express = require('express');
const app = express();
const port = 8080;
let requestCount = 0;
// 2 out of every 3 requests will fail
const FAILURE_RATE = 2 / 3;
app.get('/unstable-data', (req, res) => {
requestCount++;
console.log(`[${new Date().toISOString()}] Received request #${requestCount}`);
if (Math.random() < FAILURE_RATE) {
console.log(` -> Responding with 503 Service Unavailable.`);
res.status(503).send('Service is intentionally unavailable.');
} else {
console.log(` -> Responding with 200 OK.`);
const data = {
message: `Success on attempt #${requestCount}`,
timestamp: new Date().toISOString()
};
res.status(200).json(data);
}
});
app.get('/stable-data', (req, res) => {
console.log(`[${new Date().toISOString()}] Received request to stable endpoint.`);
res.status(200).json({ message: "This endpoint is always reliable." });
});
app.listen(port, () => {
console.log(`Flaky server listening on port ${port}`);
console.log(`Failure rate for /unstable-data is set to ${FAILURE_RATE * 100}%`);
});
Run node flaky-server.js in your terminal to start this service.
Step 2: Configure and Run the Envoy Proxy
Envoy’s configuration is centered around a YAML file. We will create an envoy.yaml file to define the listener, routes, and most importantly, the circuit breaker policy.
# MARK: - envoy.yaml
# A production-grade Envoy configuration for a local client-side proxy
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000 # The SwiftUI app will connect to this port
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
# Route all traffic to our defined backend service cluster
cluster: backend_service
# Set a reasonable timeout for the route
timeout: 5s
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: backend_service
connect_timeout: 2s
type: STRICT_DNS
# Note: To access the host machine from inside a Docker container, use host.docker.internal
# If Envoy is running directly on the host, use localhost
load_assignment:
cluster_name: backend_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: host.docker.internal
port_value: 8080 # Points to our unstable Node.js service
# --- Core Circuit Breaker (Outlier Detection) Configuration ---
outlier_detection:
# The number of consecutive 5xx errors required to trigger an ejection.
# In a real project, this value needs to be tuned based on the service's normal error rate.
# For a critical service, a value of 1 might be too sensitive. We'll use 2 as an example.
consecutive_5xx: 2
# The time interval for ejection analysis. Envoy checks host health this often.
# 10s is a balanced value, providing timely reactions without being too frequent.
interval: 10s
# The base duration a host is ejected for.
# The actual ejection time will be base_ejection_time * number_of_ejections.
# First ejection is 30s, second is 60s, and so on.
base_ejection_time: 30s
# The maximum percentage of hosts in the cluster that can be ejected.
# For a single-node local proxy scenario, this can be 100.
# In a backend cluster, this is typically set lower (e.g., 10-20%) to prevent cascading failures.
max_ejection_percent: 100
# The percentage of healthy hosts required to enforce ejection.
# Setting this to 0 means ejections will occur even if all hosts are unhealthy.
enforcing_success_rate: 0
# Similarly, set up a separate detector for consecutive gateway failures (502, 503, 504).
# This is useful for distinguishing between service errors and network partition issues.
consecutive_gateway_failure: 2
enforcing_consecutive_gateway_failure: 100
Running Envoy with Docker is the most convenient way:docker run --rm -it -p 10000:10000 -p 9901:9901 -v $(pwd)/envoy.yaml:/etc/envoy/envoy.yaml envoyproxy/envoy:v1.27.0
-
-p 10000:10000maps our listener port. -
-p 9901:9901maps Envoy’s admin port, which is useful for checking status. -
-vmounts our localenvoy.yamlinto the container.
Step 3: Modify the SwiftUI Network Layer to Use the Proxy
Now, the only change needed in our application code is the baseURL in ApiService. We need to point it to our local Envoy proxy instead of directly to the backend service.
// MARK: - ResilientApiService.swift (Final Version)
import Foundation
import Combine
// An API service layer that achieves resilience through an Envoy proxy
final class ResilientApiService {
// *** The Key Change ***
// Point the request destination from the backend service to the local Envoy proxy
private let baseURL = URL(string: "http://localhost:10000")!
// ApiError definition remains unchanged
enum ApiError: LocalizedError {
// ... (same as before) ...
}
// The fetchData method signature and implementation are almost identical
func fetchData(endpoint: String) -> AnyPublisher<Data, ApiError> {
let url = baseURL.appendingPathComponent(endpoint)
// URLSession configuration and usage are exactly the same.
// The application layer is unaware of the proxy's existence.
return URLSession.shared.dataTaskPublisher(for: url)
.tryMap { data, response in
guard let httpResponse = response as? HTTPURLResponse else {
throw ApiError.unknown
}
// The statusCode here might now come from Envoy (e.g., 503) or the real backend
guard (200...299).contains(httpResponse.statusCode) else {
throw ApiError.serverError(statusCode: httpResponse.statusCode)
}
return data
}
.mapError { error -> ApiError in
if let urlError = error as? URLError {
return .networkError(urlError)
} else if let apiError = error as? ApiError {
return apiError
} else {
return .decodingError(error)
}
}
.eraseToAnyPublisher()
}
}
The ViewModel’s code requires absolutely no changes. It just needs to use the new ResilientApiService. This is the immense benefit of separation of concerns.
Step 4: Observe the Circuit Breaker in Action
Now, let’s put all the pieces together:
- Run
flaky-server.js. - Run the Envoy Docker container configured with
outlier_detection. - Run our macOS SwiftUI application.
Scenario 1: Before the Breaker Opens
Click the refresh button continuously.
- Node.js Service Logs: You’ll see requests coming in, some succeeding with a 200 OK, and some failing with a 503 Service Unavailable.
- SwiftUI App: The UI will intermittently display successfully fetched data or a 503 error message. The response may feel slow, as each failure actually hits the backend.
Scenario 2: The Breaker Trips
When you click rapidly enough to cause Envoy to observe 2 consecutive 5xx errors (based on our consecutive_5xx config), the circuit breaker “trips.”
- Envoy Logs: You’ll see a log entry similar to this, indicating an upstream host has been ejected.
[...][debug][upstream] [...|...|...] outlier_detection: ejecting host [IPv4:172.17.0.1:8080] for 30000ms - SwiftUI App: Now, when you click refresh, the UI will immediately display an error with a 503 status code. This 503 is generated by Envoy itself, signifying “no healthy upstream.” The request is never forwarded to the backend Node.js service. The user experience has transformed from “lag then error” to “instant error,” a qualitative improvement.
- Node.js Service Logs: During the 30-second ejection period (
base_ejection_time), the Node.js service will receive no requests for/unstable-data. It’s given a chance to recover.
Scenario 3: The Breaker Closes
After 30 seconds, Envoy will automatically move the backend node back into the healthy pool, entering a “half-open” state. It will allow a single request through to the backend.
- If this request succeeds, the circuit breaker will fully close (reset), and normal traffic flow will resume.
- If this request fails again, the breaker will trip once more, and the next ejection period may be longer (
base_ejection_time* number of ejections).
This complete feedback loop—from detecting failure, to isolating the fault, to automatically recovering—is handled transparently by Envoy outside the application. The purity of the SwiftUI application code is preserved to the greatest extent possible.
// MARK: - Final SwiftUI View
import SwiftUI
struct ContentView: View {
// The ViewModel implementation requires no changes
@StateObject private var viewModel = DataViewModel()
var body: some View {
VStack(spacing: 20) {
Text("Client-Side Circuit Breaker Demo")
.font(.title)
ScrollView {
Text(viewModel.content)
.font(.body)
.padding()
.frame(minHeight: 100, alignment: .topLeading)
.background(Color.secondary.opacity(0.1))
.cornerRadius(8)
}
if viewModel.hasError {
Text(viewModel.errorMessage)
.foregroundColor(.red)
.lineLimit(2)
}
Button(action: {
viewModel.loadData()
}) {
Text("Fetch Unstable Data")
.padding()
.background(Color.blue)
.foregroundColor(.white)
.cornerRadius(10)
}
}
.padding()
.frame(width: 400, height: 300)
.onAppear {
viewModel.loadData()
}
}
}
The final UI code and ViewModel remain simple and predictable. All the complexity has been elegantly encapsulated at the infrastructure level.
Limitations and Extensions
This architecture of applying the sidecar proxy pattern to the client-side is not a silver bullet. Its primary use case is for desktop applications (macOS, Windows, Linux), where managing a local Envoy process is straightforward. For iOS or Android, implementing a similar mechanism would require technologies like Network Extensions or VPN Profiles, significantly increasing complexity and maintenance overhead.
Furthermore, this solution introduces an additional operational component—managing the lifecycle and configuration of the local Envoy instance. For tools distributed within a corporate environment, this can be handled via startup scripts or MDM (Mobile Device Management) solutions.
Our current implementation has only scratched the surface of Envoy’s capabilities. Based on this architecture, we can easily extend it with more advanced features without touching the Swift code:
- Automatic Retries: By adding a
retry_policyto Envoy’s route configuration, we can implement automatic, exponential-backoff retries for failed requests (e.g., 503s or network timeouts). - Rate Limiting: Using Envoy’s local or global rate limiting filters, we can prevent the application from inadvertently launching a DDoS attack on an API due to frequent user actions.
- Observability: Envoy can produce a wealth of metrics, access logs, and distributed tracing data. We can feed this data into Prometheus and Grafana to gain deep insights into client-side API call behavior, which is crucial for troubleshooting issues that only occur in specific user environments.
- Traffic Mirroring: A small fraction of production traffic can be duplicated and sent to a backend service in a staging environment for live regression testing, all completely transparent to the user.