Asynchronous and Generic Data Processing Pipeline

3 min readSep 18, 2024

Asynchronous programming is an important approach to enhance performance and reduce response times in request-reply structures. With the IAsyncEnumerable stream in C#, we can process data that is streamable or obtained asynchronously from data sources in a real-time, performant, and efficient manner, allowing system resources to be used more effectively.

GitHub — halilkocaoz/AsyncPipeline

Contribute to halilkocaoz/AsyncPipeline development by creating an account on GitHub.

github.com

What is IAsyncEnumerable?

It is an interface that represents structures where data flow can be provided asynchronously, allowing us to obtain data asynchronously.

The most fundamental difference between IEnumerable and IAsyncEnumerable is that while IEnumerable provides data sequentially and synchronously, IAsyncEnumerable allows elements to be processed immediately upon availability, making it particularly useful for long-running operations or sources with inherent delays. Instead of waiting for all the data to arrive, each element is processed as it becomes available from the source. An important point to note here is that the source must be able to provide data asynchronously.

Why Use IAsyncEnumerable?

We prefer IAsyncEnumerable to perform long-running data processing tasks with asynchronous iteration without blocking other operations. It’s especially useful for database reading, file read/write, or network I/O processes.

Data Processing Pipeline

By using a pipeline, we can increase the efficiency of stream-based data processing by breaking down large and complex tasks into smaller, more manageable steps. As shown in the code examples below, each operation (multiplying numbers, adding, converting to text) works independently and asynchronously, providing modularity, flexibility, and reusability. This not only enhances performance but also improves maintenance processes, developer experience, and simplifies development workflows.

Implementation

We will create a pipeline for data that will be processed asynchronously. For our pipeline, we need data processing steps. We have defined this with an interface; our concrete implementations will be based on this abstraction.

Our pipeline will utilize this interface definition to understand the processing steps. The method names are quite clear: we will use Create to create a pipeline, AddStep to add a processing step to the pipeline, and ExecuteAsync to run the pipeline.

AddStep doesn’t actually add anything anywhere; it ensures the chaining of method calls that come before or after it.

Data Processing Steps

As emphasized above, all data processing steps are defined independently from each other, derived from IAsyncPipelineStep in their concrete forms, and can work asynchronously for their specific tasks.

Usage of the Implementation

In the usage example below, two data processing pipelines are started asynchronously, and both are awaited until their operations are completed.

Each time you run this program, you will notice that the data is often printed to the screen in different orders. Even though text processing is started later, it can interleave with the outputs of number processing. This was our fundamental goal: to ensure that long data processing operations do not affect each other or subsequent operations, revealing an efficient working structure.

The data processing structure I tried to modularize with IAsyncEnumerable, along with the approaches I applied to increase performance and efficiency, demonstrate how asynchronous programming allows you to perform long-running operations without blocking other tasks. By using the pipeline approach, you can break down complex data processing tasks into more manageable pieces. These strategies ensure more efficient use of system resources and improve the developer experience.