Benchmarking FutureSDR

5 November 2021

The introductory video of FutureSDR already showed some quick benchmarks for the throughput of message- and stream-based flowgraphs. What was missing (not only for FutureSDR but for SDRs in general) were latency measurements. I, therefore, had a closer look into this issue.

While throughput can be measured rather easily (by piping a given amount of data through a flowgraph and measuring its execution time), latency is more tricky. The state-of-the-art is to do I/O measurements, where the flowgraph reads samples from an SDR, processes them, and loops them back. Using external HW (i.e., a signal generator and an oscillator), one can measure the latency.

The drawback of this approach is obvious. It requires HW, a non-trivial setup, is hard to automate and integrate in CI/CD.

An alternative is measuring latency by logging when a sample is produced in a source and received in a sink. The main requirement for this measurement is that the overhead must be minimal. Otherwise, one easily measures the performance of the logging or impacts the flowgraph in a way that its behavior is no longer representative for normal execution.

There are many possible solutions (like logging to stdout, a file, or a ramdisk) or using eBPF (uprobes or USDTs). However, they all introduce considerable overhead or require significant manual tuning and optimizations (e.g. logging to a ramdisk needs to be synchronized and should ideally use a binary format; uprobes and USDTs trigger a context switch to the kernel, etc.).

LTTng allows probing user space applications with minimal overhead. It uses one ring buffer per CPU to log custom events in a binary format. Furthermore, there are tools and libraries available to evaluate the traces.

The main drawback of LTTng is that it is only available on Linux and requires adding tracepoints to the code. The latter is easily possible for GNU Radio and FutureSDR by adding custom sources and sinks that provide these tracepoints.

The underlying idea is to define a granularity and issue TX/RX events when granularity samples are produced/consumed in the source/sink. One then correlates the events in post-processing to calculate the latency.

The work function of the modified Null Source, for example, checks if self.probe_granularity samples were produced and, in case, issues a tracepoints::null_rand_latency::tx event that is logged by LTTng.

async fn work(&mut self, _io: &mut WorkIo, sio: &mut StreamIo, _mio: &mut MessageIo<Self>, _meta: &mut BlockMeta) -> Result<()> {
    let o = sio.output(0).slice::<u8>();
    unsafe {
        ptr::write_bytes(o.as_mut_ptr(), 0, o.len());
    }

    let before = self.n_produced / self.probe_granularity;
    let n = o.len() / self.item_size;
    sio.output(0).produce(n);
    self.n_produced += n as u64;
    let after = self.n_produced / self.probe_granularity;

    if before != after {
        tracepoints::null_rand_latency::tx(self.id.unwrap(), after);
    }
    Ok(())
}

For GNU Radio, we created a Null Source in a similar manner.

I checked the overhead for FutureSDR executing a flowgraph (1) without LTTng tracepoints, (2) with disabled tracepoints, (3) with enabled tracepoints. Adding the tracepoint and the checks if granularity samples were produced adds ~4% overhead. Actual logging didn’t introduce a sizeable difference for a granularity of 32768 float samples.

The measurements were conducted, following the methodology described in [1]. In short, we created a CPU set for measurements and orchestrated the measurements through a Makefile. We allocated 3 cores with their hyper-threads to this CPU set. On my system these were “CPUs” 0, 1, 2, 6, 7, and 8.

CPU Topology

We evaluated flowgraphs with 6 parallel Pipes and a configurable number of Stages.

Flowgraph Topology

The Sources are modified Null Sources followed by Head blocks to limit the data that is produced, allowing a graceful shutdown. The blocks that form the Stages just copy floats from input to output buffers. To avoid fixed, boring schedules, they copy only up to 512 samples at a time, with the actual value uniformly distributed between 1 and 512 samples.

For FutureSDR, we tested the Smol scheduler, which spawns one thread per CPU available to the process (in this case 6). The tasks (corresponding to blocks) are processed by a work-stealing scheduler that is unaware of the flowgraph topology. The Flow scheduler, in turn, has tasks associated to worker threads and processes them round-robin from upstream to downstream blocks, i.e., it exploits knowledge of the flowgraph topology.

Latency measurements with this setup provided the following results. Note that the error bars do not indicate noisy measurements but plot the 5% and 95% percentile of the latency distribution.

Latency of Sample Flowgraph

All tools and evaluation scripts are available here. Feel free to try it on your system with different parameters. For now, I would take these results with a grain of salt, but I believe that LTTng is a good option for latency measurements.

References

  1. Bastian Bloessl, Marcus Müller and Matthias Hollick, “Benchmarking and Profiling the GNU Radio Scheduler,” Proceedings of 9th GNU Radio Conference (GRCon 2019), Huntsville, AL, September 2019. [BibTeX, PDF and Details…]


Generic Blocks for Rapid Prototyping

14 September 2021

FutureSDR misses basically all blocks at this stage. Fortunately, people started contributing some of them, including blocks to add or multiply a stream with a constant. This block was implemented in way so that it was generic over the arithmetic operation. Thinking a bit further about the concept, we realized that it can be extended to arbitrary operations, creating blocks that are generic over function closures.

Meet our new blocks: Source, FiniteSource, Apply, Combine, Split, and Filter, all of which are generic over mutable closures. This can come in handy to quickly hack something together. Let me give you some examples.

Sources

Need a constant source that produces 123 as u32?

use futuresdr::blocks::Source;

let _ = Source::new(|| 123u32);

The Source block is generic over FnMut() -> A. It recognizes the output type (in this case u32) and creates the appropriate stream output.

Need a source that iterates again and again over a range or vector?

let mut v = (0..10).cycle();
let _ = Source::new(move || v.next().unwrap());

Notice, how this closure is mutable (i.e., has state). One could just as well create a counter or implement a signal source that keeps the current phase as state, etc.

let mut i = 0u32;
let _ = Source::new(move || { i += 1; i });

Sometimes, the function signature and, hence, the data type of the output might not be obvious. In this case, we can be more explicit:

let mut i = 0u32;
let _ = Source::new(move || -> u32 { i += 1; i });

Now, what about a finite source? One could, of course, add a Head block after the source and terminate after a given number of items. But this might not be ideal for all use cases. So we added a FiniteSource that returns Option<A> and stops once the closure returns None.

A vector source that terminates, once it outputted all items would be:

use futuresdr::blocks::FiniteSource;

let mut v = vec![1, 2, 3].into_iter();
let _ = FiniteSource::new(move || v.next());

Apply, Combine, Split

A similar concept can be realized for simple operations on streams. Need a block that constrains an f32 in an interval between -1 and 1?

use futuresdr::blocks::Apply;

let _ = Apply::new(|x: &f32| x.clamp(-1.0, 1.0));

Need a block that adds 42 to a u32 and returns the result as f32?

let _ = Apply::new(|x: &u32| *x as f32 + 42.0);

The Apply block is generic over FnMut(&A) -> B, i.e., any mutable closure that gets a reference to an item of type A in the input buffer and produces an item of type B that will be written to the output buffer.

Since input and output types can be different, we can implement a block that computes the magnitude of a complex number, for example.

let _ = Apply::new(|x: &Complex<f32>| x.norm());

Note that the closure, again, is mutable and can have state. This means, we could very easily implement an IIR filter.

let state = 0f32;
let alpha = 0.1;
let _ = Apply::new(move |x: &f32| -> f32 { state = state * alpha + (1.0 - alpha) * *x; state  } );

The Combine and Split blocks are conceptually similar, just that they are for functions with two inputs and outputs, respectively. Combine is generic over FnMut(&A, &B) -> C to implement, for example, a block that adds two streams. Split is generic over FnMut(&A) -> (B, C) to implement, for example, a block that splits a complex number in real and imaginary parts. Examples for these blocks can be found in the corresponding integration tests.

Filter

A similar concepts is used in the Filter block, which relaxes the fixed in–out relationship of the Apply block.

It is generic over FnMut(&A) -> Option<B> and allows filtering the input stream. If the closure returns Some(B), the value is written in the output buffer; if the closure returns None, nothing is written to the output buffer.

A stateless block that only copies even numbers would be:

use futuresdr::blocks::Filter;
let _ = Filter::new(|i: &u32| -> Option<u32> {
    if *i % 2 == 0 {
        Some(*i)
    } else {
        None
    }
});

A stateful block that only copies every other sample could look like this:

let mut output = false;
let _ = Filter::new(move |i: &u32| -> Option<u32> {
    output = !output;
    if output {
        Some(*i)
    } else {
        None
    }
});

Conclusion

I think these blocks are nice to quickly hack something together and can make up for a quite a few missing blocks. Performance-wise, there might be drawbacks. The compiler would have to be really smart to figure out that it could use SIMD instructions when adding streams, for example.

Still, we think that these blocks show the bright side of using Rust. While it would be possible to implement similar blocks in other languages and other SDR frameworks, function closures and iterators are really fun with Rust.

We hope you give them a try :-)


Hello World!

27 August 2021

We are not perfect, but we are here :-) Yay! A lot of stuff is still very much in the flow, but we think that the project reached a state, where it might be interesting for some.

After a lot of refactoring, we believe that the main components are in place and development is more fun, working on bugs and issues that are more local, i.e., one doesn’t have to change bits across the whole code base to fix something :-)

FutureSDR implements several new concepts. We hope you have some fun playing around with them. So happy hacking and please get in touch with us on GitHub or Discord, if you have questions, comments, or feedback.