Sync vs Async Blocks

2022-05-03

FutureSDR supports both sync and async blocks. Their only difference is the work() function, which is either a normal or an async function. Overall, a Block is an enum containing a sync/async block with a sync/async kernel, implementing work(). This class structure already suggests that supporting both implementations leads to complexity, bloat, and code duplication.

pub trait AsyncKernel: Send {
    async fn work(&mut self, ...) -> Result<()> {
        Ok(())
    }
    ...
}

pub trait SyncKernel: Send {
    fn work(&mut self, ...) -> Result<()> {
        Ok(())
    }
    ...
}

Obviously, it would be possible to just use async blocks, since one is not forced to .await anything inside an async function, i.e., one could just make any sync function async. The reason both implementations exist, is that async blocks implement the AsyncKernel trait, defining async functions. This is an area where Rust is still in active development. Out-of-the-box, it does not support async trait functions, which is why everybody refrains to the async_trait crate that enables this. The popularity of this crate shows how desperate people want this language feature.

The reason that it is not mainline is – according to my understanding – that there is a performance penalty to using async trait functions. In short, one can think of an async function as a state machine with some local variables. If the compiler knows the concrete realization, it can build complex, nested state machines during compilation. If the compiler doesn’t know the concrete realization due to dynamic dispatch of trait functions, it has to allocate the state machine during runtime for every function call.

This sounded like a complete performance disaster – at least for work(), which is called over and over again. Therefore, we added support for sync blocks, which avoid this overhead.

Performance

Already back then, some quick tests suggested that the performance difference might not be that big. So the question is, whether it is really worth having both block types. Today, we conducted some experiments to have a closer look.

The code and all scripts are available here. In short, the measurements consider three schedulers: a single-threaded scheduler (Smol-1), a multi-threaded scheduler (Smol-N), an optimized, multi-threaded schedulers (Flow) that polls blocks in their natural order (upstream to downstream). We make six CPU cores available to the process and use six worker threads for the multi-threaded schedulers. The flowgraph consists of six independent subflows, each with a source that streams 200e6 32-bit floats into the flowgraph and #Stages (x-axis) number of copy blocks, each copying a random number samples (uniformly distributed in [1; 512]) in each call to work. We create a sync and an async version of the otherwise identical copy blocks.

Execution Time of Flowgraphs

The blocks do not do any DSP and only copy small chunks of samples. The performance is, therefore, mainly determined by the overhead of the runtime and the potential overhead of the async block. Yet, the differences are minor.

Conclusion

This suggests that it is not worth supporting sync implementations. At least not for now. And in the future, I expect that things just get better. There are ongoing discussions how Rust should handle async trait functions. Maybe a more efficient way is found, which would further improve the situation.

In retrospect, one could see this as a failed premature optimization. But it is also interesting to see the effect and quantify its impact on performance. As time permits, I will go ahead and remove sync blocks, so we get back to a more minimal runtime.


Full ZigBee SDR Receiver in the Browser

2022-03-02

Some months ago, we showed a complete SDR waterfall plot running in the browser. It interfaced an RTL-SDR from within the browser, using cross-compiled drivers. In short, this requires compiling the driver to WebAssembly (Wasm) using Emscripten and a shim that maps libusb to WebUSB calls.

Signal processing was implemented with FutureSDR. It even supported wgpu custom buffers for platform-independent GPU acceleration. Wgpu is really awesome. It supports all major platforms, using their native backends: Linux/Android (→ Vulkan), Windows (→ DX12), macOS/iOS (→ Metal), and Wasm (→ WebGPU).

What was missing was a real, non-trivial SDR application. We were curious what is possible, so we developed a ZigBee receiver and cross-compiled it to Wasm. Furthermore, since the RTL-SDR doesn’t work in the 2.4GHz band and doesn’t provide the required bandwidth, we cross-compiled the driver of the HackRF in a similar fashion.

To test the receiver, we generated ZigBee frames with Scapy and sent them using an ATUSB IEEE 802.15.4 USB Adapter.

import time
from scapy.all import *

# linux: include/uapi/linux/if_ether.h
ETH_P_IEEE802154 = 0x00f6

i = 0
while True:
    fcf = Dot15d4FCS()
    data = Dot15d4Data(dest_panid=0x47d0, dest_addr=0x0000, src_panid=0x47d0, src_addr=0xee64)
    frame_data = fcf/data/f"FutureSDR {i}"
    frame_data.show()
    sendp(frame_data, iface='monitor0', type=ETH_P_IEEE802154)
    time.sleep(0.4)
    i += 1

Turns out, this actually works and the 4Msps of the ZigBee receiver can be processed in real-time in the browser. We host a demo on our website, which is hard-coded to ZigBee channel 26 @ 2.48GHz. The receiver is, however, also part of the examples. It works just as well as native binary outside the browser, using SoapySDR to interface the hardware.

At the moment, the FutureSDR receiver only uses one thread that is, however, separate from the HackRF RX thread, spawned by the driver. Compiling in release mode, FutureSDR uses around 20% CPU on an Intel i7-8700K. See the demo video here:

And as usual, everything just works on the phone. This is not an Android application with cross-compiled drivers. It runs the whole ZigBee receiver in the Google Chrome browser that is shipped with my phone. Really fascinating what is possible in the browser these days…

Phone Setup

If you have ideas for cool applications, feel free to reply to one of the Twitter threads :-)


Slab Buffers

2022-03-01

Buffers are at the heart of every SDR runtime. GNU Radio, for example, is famous for its double-mapped circular buffers. In short, they use the MMU to map the same memory twice, back-to-back in the virtual address space of the process. This arrangement allows to implement a ring buffer on-top that always presents the available read/write space as consecutive memory, similar to a C array. The figure below shows how a buffer, consisting of physical memory areas A and B, would be mapped.

Double-Mapped Circular Buffer

Using these buffers, blocks can assume that data is always in linear, consecutive memory. In contrast to normal circular buffers, they do not have to care about wrapping. This simplifies DSP implementations, in particular, for algorithms that consider multiple samples to produce output (e.g., a FIR filter). Furthermore, samples in linear memory allow using vectorized instructions (provided by SIMD extensions), which can make a big difference [1, 2].

Given these advantages, double-mapped circular buffers were also adopted by FutureSDR. (There is now also a separate crate for them, in case you want to roll your own SDR application without a framework or runtime.) These buffers work well for Linux, Android, Windows, and macOS. FutureSDR, however, also targets platforms that do not allow memory mapping (WebAssembly/WASM) or do not have a MMU in the, first place.

read more

Benchmarking FutureSDR

2021-11-05

The introductory video of FutureSDR already showed some quick benchmarks for the throughput of message- and stream-based flowgraphs. What was missing (not only for FutureSDR but for SDRs in general) were latency measurements. I, therefore, had a closer look into this issue.

While throughput can be measured rather easily (by piping a given amount of data through a flowgraph and measuring its execution time), latency is more tricky. The state-of-the-art is to do I/O measurements, where the flowgraph reads samples from an SDR, processes them, and loops them back. Using external HW (i.e., a signal generator and an oscillator), one can measure the latency.

The drawback of this approach is obvious. It requires HW, a non-trivial setup, is hard to automate and integrate in CI/CD.

An alternative is measuring latency by logging when a sample is produced in a source and received in a sink. The main requirement for this measurement is that the overhead must be minimal. Otherwise, one easily measures the performance of the logging or impacts the flowgraph in a way that its behavior is no longer representative for normal execution.

read more

Generic Blocks for Rapid Prototyping

2021-09-14

FutureSDR misses basically all blocks at this stage. Fortunately, people started contributing some of them, including blocks to add or multiply a stream with a constant. This block was implemented in way so that it was generic over the arithmetic operation. Thinking a bit further about the concept, we realized that it can be extended to arbitrary operations, creating blocks that are generic over function closures.

Meet our new blocks: Source, FiniteSource, Apply, Combine, Split, and Filter, all of which are generic over mutable closures. This can come in handy to quickly hack something together. Let me give you some examples.

Sources

Need a constant source that produces 123 as u32?

use futuresdr::blocks::Source;

let _ = Source::new(|| 123u32);

The Source block is generic over FnMut() -> A. It recognizes the output type (in this case u32) and creates the appropriate stream output.

Need a source that iterates again and again over a range or vector?

let mut v = (0..10).cycle();
let _ = Source::new(move || v.next().unwrap());
read more

Hello World!

2021-08-27

We are not perfect, but we are here :-) Yay! A lot of stuff is still very much in the flow, but we think that the project reached a state, where it might be interesting for some.

After a lot of refactoring, we believe that the main components are in place and development is more fun, working on bugs and issues that are more local, i.e., one doesn’t have to change bits across the whole code base to fix something :-)

FutureSDR implements several new concepts. We hope you have some fun playing around with them. So happy hacking and please get in touch with us on GitHub or Discord, if you have questions, comments, or feedback.