RISC-V Vector Extension: Between standardization and tailor-made accelerators

Interviews | March 10, 2026

By Alexander Neumann

Conference RISC-V IoT Microcontrollers Embedded Leader's talk

On 15 April, 2026, Elektor is hosting a conference on RISC-V and its increasing significance for embedded and IoT systems. Ahead of the event, we spoke with one of the speakers, Thang Minh Tran. He is the founder, CEO at CTO of Simplex Micro, a semiconductor company developing RISC-V-based CPUs and vector processor IPs for AI, ML, edge AI and high-performance computing applications. His presentation explains in a practical way what embedded developers can already use productively with the RISC-V Vector Extension (RVV 1.0) today – and which technologies they should prepare for the next generation of systems.

Alexander Neumann: Which parts of the RISC-V Vector Extension are truly production-ready today – and where do developers still encounter limitations in toolchains or debugging environments?

Thang Minh Tran: The core Vector Extension 1.0 is stable and shipping in real silicon, so that part is ready. But one important reality is that many vendors do not stop at standard RVV. They add custom extensions on top of it to accelerate specific workloads like AI, crypto, or signal processing. This is both strength and challenge. It is strength because companies can optimize deeply for their own application. But it is challenge because toolchains and debugging tools must support not only standard RVV but also these custom instructions. So production readiness depends not only on RVV itself, but also on how well the ecosystem supports vendor-specific additions.

Neumann: How should developers decide whether migrating to RVV makes sense compared to optimizing existing NEON/DSP codebases?

Thang Minh Tran: For long-term strategy, unlike NEON, the RVV is vector length agnostic (VLA) meaning that the upgrade to next level of performance requires no change in software. For performance, the concept of chaining in RVV does not exist with NEON/DSP, RVV can improve performance with fewer instructions. When thinking about migration, custom extensions must be part of the equation. RVV gives portability and scalability, but custom extensions give differentiation. If your product needs unique acceleration – maybe low-latency AI, special filtering, or deterministic execution – then custom extensions may give performance that fixed NEON/DSP cannot match. But you must also consider maintenance cost. Custom instructions can create vendor lock-in if not standardized.

So the question becomes: do you want portability first, or do you want architectural control and optimization? For long-term strategy, RVV plus carefully designed custom extensions can provide both, but it requires more architectural discipline. Note that with either NEON or RVV, it is especially difficult when save and restore wide vector registers on interrupt.

Neumann: How far along is the “scalar-vector-matrix” paradigm? Are clear industry standards already emerging, or are we still at the beginning of consolidation?

Thang Minh Tran: Scalar and vector are standardized and stable. Matrix is still forming. Depending on the matrix size, RVV can perform well in small or medium matrix multiplication. But in practice, many companies are not waiting for full matrix standardization. They are already building custom matrix-like extensions to solve immediate AI needs. This means the market is partially standardized and partially experimental at the same time. We are in a hybrid phase. Standards are coming, but innovation is happening in parallel through custom extensions. Over time, some of these ideas will migrate into formal standard extensions.

Neumann: What does this new architectural approach mean for compilers and scheduling strategies – does it disrupt old optimization models?

Thang Minh Tran: Standard RVV already requires dynamic handling of vector length, VLA. Custom extensions increase complexity for compilers. When you add custom instructions, compilers must understand new opcodes, new data paths, and new scheduling rules. This can disrupt old optimization models because now the compiler must balance scalar code, vector code, and possibly matrix or custom functional units. In many cases, intrinsics or even hand-tuned kernels become necessary. So compilers become more modular and more extensible. The architectural flexibility is powerful, but the software stack must keep up.

Neumann: What pitfalls do you see when porting existing C/C++ workloads to RVV – especially with mixed scalar/vector kernels?

Thang Minh Tran: Understand your code, RVV, and microarchitecture give the performance boost when porting your code to RVV. RVV compiler is not at the mature stage yet where hand-tuned kernels are necessary. Another pitfall is over-optimizing for hardware before workload patterns are fully understood. Developers must design abstraction layers carefully so that standard RVV paths exist, and custom extensions are optional acceleration paths. Otherwise, portability and maintainability suffer.

Neumann: When do you expect widespread availability of SoCs that fully implement RVV 1.0, rather than just subsets?

Thang Minh Tran: RVV 1.0 will spread steadily. One must consider the application types. For low-end, power conscious application, there is no need to have full RVV 1.0. We talked about adding custom instructions to improve performance but removing unnecessary instructions for power is also important. Does your IP processor has the toolchain to both adding or removing instructions? High-end and edge AI applications most likely will require custom extensions. So instead of expecting pure identical RVV everywhere, we should expect a base standard plus differentiated layers on top. Widespread adoption does not mean identical implementation; it means predictable baseline plus innovation space.

Neumann: What role will RVV and the upcoming Matrix Extension play in future edge AI chips – could RISC-V become the de facto standard here?

Thang Minh Tran: In edge AI, workloads are diverse and power budgets are tight. RVV gives flexible data-parallel acceleration. Matrix extensions improve dense neural network math. Matrix extension which is part of the instruction set is much more powerful than being implemented as hardware accelerators. Custom extensions allow targeting specific bottlenecks – for example, low-latency memory handling, sparse operations, or deterministic scheduling. Together, this creates an architecture that can replace fixed NPUs in some segments. Eventually, the custom instructions could become standard extension or custom extension per application and RVV becomes the de facto standard. But “de facto standard” would also depend on:

compiler maturity + tuned libraries (CMSIS-NN equivalents, GEMM microkernels, quant kernels),
framework integration (TVM/XLA/Glow-style stacks), and
debug/profiling tools that let teams ship reliably.

For more information on Elektor’s online conference “RISC-V – Open Architecture for Embedded, AI, and Automotive” on 15 April, see the conference website. Register today! (Early-Bird discount is active until 20 March.)