Compiler Engineer · Low-level systems · Class of '26
I build the layer between language and silicon — compiler IR, parallel runtimes, and GPU offloading. GSoC '25 at Fortran-Lang, core LFortran contributor, four published papers, with early LLVM upstream contributions. Joining Qualcomm's ARM compiler team in July 2026 — chasing MLIR & ML compilers along the way.
Trajectory — where I am, where I'm headed
01 / Featured
Google Summer of Code 2025 · Fortran-Lang · May → Sept
C++ · Fortran · OpenMP · CUDA · LLVM / ASR
LFortran is a from-scratch Fortran compiler targeting LLVM and C backends. My GSoC project built OpenMP deep enough to eventually enable GPU execution of Fortran — something no open-source Fortran compiler handles cleanly today.
The architectural decision: OpenMP handling was entangled with the DoConcurrentLoop path. I proposed and implemented a dedicated OMPRegion ASR node — giving OpenMP first-class representation in the IR tree. This unlocked structurally clean nested and hierarchical parallelism.
On that foundation: 13+ constructs, 8+ clauses across thread, team, and task models — then extended the C-backend to emit compilable host-device code for OpenMP Target Offloading on NVIDIA GPUs. A custom GPU emulator under 250 lines handles CI without physical hardware.
Open Source · LFortran · Sept 2024 → Sept 2025
C++ · Fortran · MPI · ISO_C_BINDING
Contributed to compiling POT3D — Predictive Science's MPI + OpenMP solar magnetic field solver used in real space-weather research. The 9th production-grade third-party code LFortran ever compiled.
Built a pure Fortran MPI wrapper library using ISO_C_BINDING with 30+ subroutine implementations — eliminating C-wrapper overhead. Now lives in the fortran_mpi repo under the lfortran org. Separately: 50+ compiler issues resolved across OpenMP, OOP, structs, and strings.
Beyond the two flagship efforts, smaller upstream contributions: a single merged ClangIR x86 rdtsc / rdtscp builtins PR (#180714) in the LLVM monorepo, plus ongoing Flang OpenMP work.
02 / Experience
03 / Research
Maintaining a Maximal Independent Set on dynamic graphs as edges are inserted and deleted — at billion-edge scale — with parallel GPU and multi-core compute.
Best measured results — 15.64× on insertions and 10.57× on deletions, against the sequential baseline.
ParMIS: Fast & Unified MIS Maintenance for Large-Scale Dynamic Graphs
04 / Projects
05 / Stack
06 / Contact
Not job hunting — joining Qualcomm's ARM compiler team in July 2026. But always open to community conversations and open-source collaboration. If you work on MLIR, ML compilers, LLVM, or parallel runtimes — or just want to talk IR design — reach out.