Concurrent Programming
Network servers often handle many clients simultaneously. This chapter explains how Corosio supports concurrency using C++20 coroutines and the strand pattern for safe shared state access.
Why Concurrency?
Sequential programs execute one operation at a time. When a sequential program waits for a network response, it sits idle—wasting CPU cycles that could do useful work.
Latency Hiding
Network operations take time: a DNS lookup, a connection handshake, waiting for data to arrive. During these waits, a concurrent program can handle other clients, process other requests, or run background tasks.
Consider a web server. If it handles one request at a time, every client waits for all previous clients to finish. With concurrency, the server processes multiple requests in parallel—while one client’s request waits for a database response, another client’s request is being sent.
Throughput
Concurrency increases throughput. A single-threaded server handling one connection at a time might manage 100 requests per second. The same server with concurrency might handle 10,000—not because any single request is faster, but because the server overlaps waiting time with useful work.
Concurrency vs Parallelism
These terms are related but distinct:
-
Concurrency: Managing multiple tasks, potentially interleaved, making progress on each. A single CPU can run concurrent tasks by switching between them.
-
Parallelism: Actually executing multiple tasks simultaneously on multiple CPUs.
Coroutines provide concurrency. Combined with multiple threads, they can also achieve parallelism. For I/O-bound workloads, concurrency alone often provides sufficient performance.
The Problem of Shared State
When multiple operations run concurrently, they may access shared data. Without synchronization, this leads to data races—bugs that are subtle, intermittent, and hard to reproduce.
Race Conditions
A race condition occurs when program behavior depends on the timing of operations:
int counter = 0;
// Task 1 // Task 2
++counter; ++counter;
// Both read 0, both write 1
// Expected: 2, Actual: 1 (data race)
The ++counter operation isn’t atomic—it reads, modifies, then writes. If two
tasks interleave, both may read the old value before either writes.
The Read-Modify-Write Hazard
The pattern read → modify → write is a classic source of races:
if (resource_available) // Read
{
resource_available = false; // Write
use_resource();
}
If two tasks check resource_available simultaneously, both may see true
and proceed to use the resource—violating the intended mutual exclusion.
Traditional Solutions
The traditional approach to safe concurrent access uses threads and mutexes.
Threads and Their Costs
Operating system threads provide parallelism but have costs:
| Cost | Details |
|---|---|
Memory |
Each thread needs a stack (often 1MB+ per thread) |
Creation |
Creating a thread involves kernel calls |
Context switches |
Switching between threads is expensive (save/restore registers, cache effects) |
A server with 10,000 connections can’t afford 10,000 threads.
Mutexes and Critical Sections
A mutex (mutual exclusion) protects shared data by allowing only one thread to hold it at a time:
std::mutex m;
int counter = 0;
void increment()
{
std::lock_guard lock(m);
++counter; // Safe: only one thread at a time
}
The region between lock acquisition and release is a critical section.
Deadlock
When tasks acquire multiple locks, they risk deadlock:
// Thread 1 // Thread 2
lock(mutex_a); lock(mutex_b);
lock(mutex_b); // waits lock(mutex_a); // waits
// Both wait forever
Deadlock requires careful lock ordering to prevent—a maintenance burden as code evolves.
Why Mutexes Are Error-Prone
Mutex-based code has problems:
-
Every access site must remember to lock
-
Holding a lock while calling other code risks deadlock
-
Forgetting a lock causes subtle bugs discovered in production
-
Performance suffers from contention
Corosio offers a better approach for I/O-bound code: coroutines with strands.
The Event Loop Model
Instead of threads waiting on blocking calls, the event loop model uses a single thread processing events as they arrive.
Single-Threaded Concurrency
An event loop processes one event at a time:
while (!stopped)
{
wait_for_event(); // Blocks until I/O completes
handle_event(); // Run the handler
}
Events might be: "data arrived on socket X," "timer expired," "new connection ready to accept." Each handler runs to completion before the next event is processed.
Non-Blocking I/O
Traditional I/O operations block: read() waits until data arrives. Non-blocking
I/O returns immediately if no data is available, allowing the program to check
other sockets or do other work.
The event loop combines non-blocking I/O with OS notifications (select, poll, epoll, kqueue, IOCP) to efficiently wait for events across many connections.
Run-to-Completion Semantics
Each event handler runs without interruption. If you’re processing a message, no other handler for your data structures runs until you finish. This provides implicit synchronization—no need for locks within single-threaded event handling.
The Reactor Pattern
Corosio uses the reactor pattern: register interest in I/O events, wait for
events, dispatch handlers. The io_context::run() method implements this loop.
The reactor is efficient because it waits for any of many events simultaneously, rather than polling each socket individually.
C++20 Coroutines
A coroutine is a function that can suspend and resume execution. Unlike threads, coroutines don’t block the thread when waiting—they yield control to a scheduler.
Language Mechanics
C++20 adds three keywords:
| Keyword | Purpose |
|---|---|
|
Suspend until an operation completes |
|
Complete the coroutine with a value |
|
Produce a value and suspend (for generators) |
Using any of these keywords makes a function a coroutine.
Suspension Points as Yield Points
When a coroutine hits co_await, it may suspend. The thread is free to run
other coroutines or handle other events. When the awaited operation completes,
the coroutine resumes—possibly on a different thread.
capy::task<void> handle_client(corosio::socket sock)
{
char buf[1024];
auto [ec, n] = co_await sock.read_some(
capy::mutable_buffer(buf, sizeof(buf)));
// Suspends here until data arrives
if (ec)
co_return; // Exit on error
// Process data...
}
Between co_await and resumption, no code in this coroutine runs. Other
coroutines can make progress.
Coroutines vs Threads
| Property | Threads | Coroutines |
|---|---|---|
Scheduling |
Preemptive (OS can interrupt anytime) |
Cooperative (explicit yield at |
Memory |
Fixed stack (often 1MB+) |
Minimal frame (as needed) |
Creation cost |
Expensive (kernel call) |
Cheap (allocation) |
Context switch |
Expensive (kernel, cache) |
Cheap (save/restore frame) |
Why Coroutines Excel for I/O
I/O-bound programs spend most time waiting. Coroutines make waiting cheap:
-
Thousands of suspended coroutines use minimal memory
-
Resumption is just a function call
-
No kernel involvement until actual I/O
A single thread can manage thousands of concurrent connections using coroutines.
Executor Affinity
A coroutine has affinity to an executor—its resumptions go through that executor. This matters for thread safety.
What Affinity Means
When a coroutine suspends, it remembers which executor should resume it. The I/O completion notification posts the resumption to that executor, not necessarily the thread that started the operation.
Resuming Through the Right Executor
capy::run_async(ioc.get_executor())(my_coroutine());
// my_coroutine resumes through ioc's executor
If io_context::run() is called from one thread, resumptions happen on that
thread. With multiple threads calling run(), resumptions happen on whichever
thread is available.
The Affine Awaitable Protocol
Corosio operations implement the affine awaitable protocol. When you co_await
an I/O operation, it captures your executor and resumes through it. This
happens automatically—you don’t need explicit dispatch calls.
See Affine Awaitables for details.
Strands: Synchronization Without Locks
A strand guarantees that handlers posted to it don’t run concurrently. Even with multiple threads, strand operations execute one at a time.
┌───────────────┐
Thread A│ │
│ ┌───┐ │
Thread B│ │ S │───────│───────────→ Sequential execution
│ │ t │ │
Thread C│ │ r │ │
│ │ a │ │
Thread D│ │ n │ │
│ │ d │ │
│ └───┘ │
└───────────────┘
Multiple No concurrent
threads handlers
Sequential Execution Guarantees
Handlers on the same strand never overlap. If handler A is running, handler B waits. This provides mutual exclusion without explicit locks.
Implicit vs Explicit Synchronization
With mutexes, synchronization is explicit—you lock before accessing shared data. With strands, synchronization is structural—all access goes through the strand.
// Mutex approach: explicit locking at every access
std::mutex m;
void access_shared_data()
{
std::lock_guard lock(m);
// Access data
}
// Strand approach: structural serialization
auto strand = asio::make_strand(ioc);
void access_shared_data()
{
asio::post(strand, [&] {
// Access data - no lock needed
});
}
When Strands Replace Mutexes
Strands work well when:
-
Access is already through asynchronous handlers
-
The critical section is the entire handler (not a small portion)
-
You want to avoid deadlock risk
Strands work less well when:
-
You need synchronization in synchronous code
-
The critical section is a tiny portion of a large handler
-
You need to wait for shared state to reach a condition
Strands in Corosio
While Corosio doesn’t expose a standalone strand class, the pattern applies through executor affinity. When a coroutine has affinity to an executor, sequential `co_await`s naturally serialize:
capy::task<void> session(corosio::socket sock)
{
// All code in this coroutine runs sequentially
auto [ec, n] = co_await sock.read_some(buf);
// No other code in this coroutine runs until above completes
co_await sock.write_some(response);
// Still sequential
}
With a single-threaded io_context, coroutines sharing that executor can
safely access shared state without locks.
Scaling Strategies
Different applications need different concurrency strategies.
Single-Threaded: One Thread, Many Coroutines
The simplest model: one thread runs io_context::run(), handling all events.
Coroutines provide concurrency without threads.
Advantages:
-
No thread synchronization needed
-
Deterministic behavior (easier debugging)
-
Lower overhead
Limitations:
-
Can’t use multiple CPU cores
-
Long computation blocks all I/O
This model handles thousands of I/O-bound connections efficiently.
Multi-Threaded: Thread Pools
For CPU utilization or higher throughput:
corosio::io_context ioc(4); // Hint: 4 threads
std::vector<std::thread> threads;
for (int i = 0; i < 4; ++i)
threads.emplace_back([&ioc] { ioc.run(); });
for (auto& t : threads)
t.join();
With multiple threads:
-
Coroutines may run on any thread
-
Same-coroutine code between `co_await`s never overlaps with itself
-
Different coroutines can run simultaneously
For shared state across coroutines with multiple threads, use:
-
External synchronization (mutex, atomic)
-
A dedicated single-thread executor for that state
-
Message passing between coroutines
Patterns
Common patterns for structuring concurrent applications.
One Coroutine Per Connection
The simplest pattern: each client gets a coroutine.
capy::task<void> accept_loop(
corosio::io_context& ioc,
corosio::acceptor& acc)
{
for (;;)
{
corosio::socket peer(ioc);
auto [ec] = co_await acc.accept(peer);
if (ec) break;
// Spawn independent coroutine for this client
capy::run_async(ioc.get_executor())(
handle_client(std::move(peer)));
}
}
Each handle_client coroutine runs independently. The accept loop continues
immediately after spawning.
This works well when:
-
Connections are independent
-
Memory per connection is reasonable
-
You don’t need bounded concurrency
Worker Pools
For bounded resource usage, use a fixed pool of workers:
struct worker
{
corosio::socket sock;
std::string buf;
bool in_use = false;
explicit worker(corosio::io_context& ioc) : sock(ioc) {}
};
// Preallocate workers
std::vector<worker> workers;
workers.reserve(max_workers);
for (int i = 0; i < max_workers; ++i)
workers.emplace_back(ioc);
// Assign connections to free workers
Corosio’s tcp_server class implements this pattern—see
TCP Server for details.
Pipelines
For multi-stage processing, chain coroutines:
capy::task<void> pipeline(corosio::socket sock)
{
auto message = co_await read_message(sock);
auto result = co_await process(message);
co_await write_response(sock, result);
}
Each stage suspends independently, allowing other coroutines to run.
Common Mistakes
Blocking in Coroutines
Never block inside a coroutine:
// WRONG: blocks the entire io_context
capy::task<void> bad()
{
std::this_thread::sleep_for(1s); // Don't do this!
}
// RIGHT: use async timer
capy::task<void> good(corosio::io_context& ioc)
{
corosio::timer t(ioc);
t.expires_after(1s);
co_await t.wait();
}
Blocking calls (sleep, mutex lock, synchronous I/O) prevent other coroutines from running.
Dangling References in Async Code
Spawned coroutines must not hold references to destroyed objects:
// WRONG: socket destroyed while coroutine runs
{
corosio::socket sock(ioc);
capy::run_async(ex)(use_socket(sock)); // Takes reference!
} // sock destroyed here, coroutine still running
// RIGHT: move socket into coroutine
{
corosio::socket sock(ioc);
capy::run_async(ex)(use_socket(std::move(sock)));
} // OK, coroutine owns the socket
A coroutine may outlive the scope that spawned it. Ensure captured data lives long enough.
Cross-Executor Access
Don’t access an object from a coroutine with different executor affinity:
// Dangerous: timer created on ctx1, used from ex2
corosio::timer timer(ctx1);
capy::run_async(ex2)([&timer]() -> capy::task<void> {
co_await timer.wait(); // Wrong executor!
});
Keep I/O objects with the coroutines that use them.
Summary
Corosio’s concurrency model:
-
Coroutines replace threads for I/O-bound work
-
Executor affinity ensures resumption through the right executor
-
Sequential at suspend points within a coroutine
-
Strand pattern serializes access to shared state
-
Multiple threads scale throughput when needed
For most applications, single-threaded operation with multiple coroutines provides excellent performance with simple, race-free code.
Next Steps
-
I/O Context — The event loop in detail
-
Affine Awaitables — How affinity propagates
-
Echo Server — Practical concurrency example