Compiler Engineer · Low-level systems · Class of '26

AdityaTrivedi.

I build the layer between language and silicon — compiler IR, parallel runtimes, and GPU offloading. GSoC '25 at Fortran-Lang, core LFortran contributor, four published papers, with early LLVM upstream contributions. Joining Qualcomm's ARM compiler team in July 2026 — chasing MLIR & ML compilers along the way.

↗ FocusCompiler IR · Codegen ↗ UpstreamLLVM · LFortran ↗ Papers4 · HiPC, EuroPDP, FGCS ↗ GSoC'25 · Fortran-Lang

    Featured Work
    GitHub
    LinkedIn
    GSoC Blog
  

DRAG TO ORBIT · SCROLL TO FALL IN

Trajectory — where I am, where I'm headed

Now

Open-source compilers

LFortran core contributor · LLVM / ClangIR upstream · GSoC '25 with Fortran-Lang.

Qualcomm · ARM team

Joining as a Compiler Engineer on the ARM team — July 2026. Backend codegen & optimization.

Exploring

MLIR & ML compilers

Where I'm pointing my curiosity — tensor IR, accelerator codegen, the ML-compiler stack.

01 / Featured

SECTOR 01 · FLAGSHIP
2 transmissions

The work that defined
the year.

TRANSMISSION

Google Summer of Code 2025 · Fortran-Lang · May → Sept

OpenMP 6.0 Infrastructure & GPU Offloading in LFortran

C++ · Fortran · OpenMP · CUDA · LLVM / ASR

LFortran is a from-scratch Fortran compiler targeting LLVM and C backends. My GSoC project built OpenMP deep enough to eventually enable GPU execution of Fortran — something no open-source Fortran compiler handles cleanly today.

The architectural decision: OpenMP handling was entangled with the DoConcurrentLoop path. I proposed and implemented a dedicated OMPRegion ASR node — giving OpenMP first-class representation in the IR tree. This unlocked structurally clean nested and hierarchical parallelism.

On that foundation: 13+ constructs, 8+ clauses across thread, team, and task models — then extended the C-backend to emit compilable host-device code for OpenMP Target Offloading on NVIDIA GPUs. A custom GPU emulator under 250 lines handles CI without physical hardware.

13+

OpenMP constructs end-to-end

Clauses — thread, team, task

‹250

Lines — custom GPU emulator

12wk

Documented compiler work

↗ Blog · 12 weeks ↗ GPU Thread ↗ Commits

TRANSMISSION

Open Source · LFortran · Sept 2024 → Sept 2025

LFortran Compiler — Core Contributor

C++ · Fortran · MPI · ISO_C_BINDING

Contributed to compiling POT3D — Predictive Science's MPI + OpenMP solar magnetic field solver used in real space-weather research. The 9th production-grade third-party code LFortran ever compiled.

Built a pure Fortran MPI wrapper library using ISO_C_BINDING with 30+ subroutine implementations — eliminating C-wrapper overhead. Now lives in the fortran_mpi repo under the lfortran org. Separately: 50+ compiler issues resolved across OpenMP, OOP, structs, and strings.

50+

Compiler issues resolved

30+

MPI subroutines, pure Fortran

0.95×

Compile time vs GFortran

0.75×

Runtime vs GFortran

↗ POT3D Blog ↗ fortran_mpi

Also upstream

Beyond the two flagship efforts, smaller upstream contributions: a single merged ClangIR x86 rdtsc / rdtscp builtins PR (#180714) in the LLVM monorepo, plus ongoing Flang OpenMP work.

02 / Experience

SECTOR 02 · TRAJECTORY
2022 → 2026

Where I've shipped.

Jul 2026incoming

Qualcomm India — Compiler Engineer

ARM Compiler Team

C++ · LLVM · ARM / AArch64 codegen · backend optimization

Joining the ARM compiler team — backend codegen, optimization passes, and toolchain work across Qualcomm's ARM platforms.
Bringing low-level LLVM & IR experience into production compiler engineering at scale.

2026

LLVM / ClangIR — Upstream Contributor

x86 builtins · IR lowering

C++ · ClangIR · MLIR · LLVM

Merged PR #180714 into the LLVM monorepo — rdtsc / rdtscp x86 timestamp-counter builtins in ClangIR.
Active Flang OpenMP contributions upstream.

May → Sept 2025

Google Summer of Code 2025 · Fortran-Lang

Compiler Developer

C++ · Fortran · OpenMP · CUDA · ASR/IR

Designed the OMPRegion ASR node — IR abstraction giving OpenMP first-class representation, enabling scalable OpenMP 6.0 work and GPU codegen.
13+ constructs, 8+ clauses — thread, team, and task models.
Extended C-backend for OpenMP Target Offloading on NVIDIA GPUs. [Discussion]
Custom GPU emulator (<250 lines) for hardware-free CI.

Sept 2024 → 2025

LFortran — Compiler Development Engineer

Core Contributor

C++ · Fortran · MPI · Python

Helped compile POT3D solar-physics codebase — 0.95× compile, 0.75× runtime vs GFortran. [Blog]
Built pure Fortran MPI wrappers via ISO_C_BINDING, 30+ subroutines. [fortran_mpi]
50+ issues resolved: OpenMP, OOP, structs, string handling.

2022 → 2026

Indian Institute of Technology, Jodhpur

B.Tech, Computer Science · CGPA 8+

Graduation · May 2026

HPC research with Prof. Dip Sankar Banerjee on dynamic graph algorithms.
Four papers, three years, one problem class.

03 / Research

SECTOR 03 · RESEARCH
4 papers

Four papers, one problem.

Maintaining a Maximal Independent Set on dynamic graphs as edges are inserted and deleted — at billion-edge scale — with parallel GPU and multi-core compute.

Best measured results — 15.64× on insertions and 10.57× on deletions, against the sequential baseline.

2024Origin

IEEE HiPCSRS

Fast MIS on Incremental Graphs

Aditya Trivedi, P. Nijhara, D.S. Banerjee

2025Extends

EuroMicro PDP

Fast Maximal Independent Sets on Dynamic Graphs

P. Nijhara, Aditya Trivedi, D.S. Banerjee

2025Refines

IEEE HiPCSRS

Fast and Accurate MIS on Dynamic Graphs

A.H. Singh, Aditya Trivedi, N. Sharma, A. Pandey, D.S. Banerjee

2025Unifies

Elsevier FGCSUnder Review

ParMIS: Fast & Unified MIS Maintenance for Large-Scale Dynamic Graphs

P. Nijhara, Aditya Trivedi, A.H. Singh, N. Sharma, A. Pandey, D.S. Banerjee

04 / Projects

SECTOR 04 · CATALOG
6 objects

Systems work, bottom-up.

PRJ·001 — ARCHITECTURE MIPS Pipeline Simulator C++ · Computer Architecture ↗ Cycle-accurate 5-stage MIPS pipeline with hazard detection, forwarding, and branch prediction — modeling stalls and data dependencies at the microarchitecture level.Source ↗

PRJ·002 — CONCURRENCY Multi-Threaded Web Crawler Go · Concurrency · Channels ↗ Concurrent crawler with a worker pool, bounded queues, and graceful backpressure — tuned for throughput without overwhelming target hosts.Source ↗

PRJ·003 — HPC · LLM Parallel LLM Inference on RISC-V C · RISC-V · OpenMP · SIMD ↗ Optimized transformer inference on a RISC-V target — vectorized kernels and threading delivering a 3.42× speedup over the baseline.Source ↗

PRJ·004 — ML · PRIVACY Federated Fraud Detection Python · Federated Learning ↗ Privacy-preserving fraud model trained across distributed nodes without centralizing sensitive transaction data — federated aggregation end-to-end.Source ↗

PRJ·005 — FULL-STACK WorkHubPro React · Node · PostgreSQL ↗ Full-stack workspace and project-management platform — auth, role-based access, and real-time collaboration features.Source ↗

PRJ·006 — OPEN SOURCE fortran_mpi Fortran · MPI · ISO_C_BINDING ↗ Pure-Fortran MPI wrapper library, 30+ subroutines, eliminating C-wrapper overhead. Maintained under the LFortran organization.Source ↗

05 / Stack

SECTOR 05 · INSTRUMENTS
5 bands

The tools I reach for.

01Languages

CC++FortranRustGoKotlinPython

02Parallel & HPC

OpenMPCUDAMPISIMDPthreads

03Compilers & IR

LLVMMLIRClangIRASR / IR designCodegen

04Architectures

AArch64 / ARMARMv9RISC-Vx86

05Exploring

MLIRML CompilersTensor IRAccelerators

06 / Contact

Let's talk compilers.

Not job hunting — joining Qualcomm's ARM compiler team in July 2026. But always open to community conversations and open-source collaboration. If you work on MLIR, ML compilers, LLVM, or parallel runtimes — or just want to talk IR design — reach out.

Channel open · transmitting

01 · MAILEmail 02 · CODEGitHub 03 · NETLinkedIn 04 · DOCRésumé