{"id":20830,"date":"2026-06-24T22:56:20","date_gmt":"2026-06-24T22:56:20","guid":{"rendered":"https:\/\/www.monmouth.edu\/career-development\/job\/npu-kernel-engineer-2\/"},"modified":"2026-06-25T12:00:43","modified_gmt":"2026-06-25T16:00:43","slug":"npu-kernel-engineer-2","status":"publish","type":"ch_job","link":"https:\/\/www.monmouth.edu\/career-development\/job\/npu-kernel-engineer-2\/","title":{"rendered":"NPU Kernel Engineer"},"content":{"rendered":"<p><strong>Junior NPU Kernel\/Operator Engineer<\/strong><\/p>\n<p><strong>Role<\/strong><\/p>\n<p>We are looking for a Junior NPU Kernel\/Operator Engineer to develop and optimize deep learning operators for a custom AI accelerator \/ NPU. The role focuses on kernel\/operator implementation, performance tuning, and correctness validation across a broad range of neural network workloads.<\/p>\n<p>This is a good fit for candidates with strong C\/C++ and Python skills who are interested in hardware-aware software optimization. Prior NPU experience is helpful but not required.<\/p>\n<p><strong>Responsibilities<\/strong><\/p>\n<ul>\n<li>Implement and optimize NPU operators such as normalization, reduction, transpose, reshape, gather\/scatter, quant\/dequant, and fused elementwise kernels.<\/li>\n<li>Tune kernels for memory bandwidth, SRAM usage, data reuse, DMA latency, bank conflicts, and compute utilization.<\/li>\n<li>Validate operator correctness against PyTorch, NumPy, or framework reference results.<\/li>\n<li>Benchmark performance on simulator or silicon.<\/li>\n<li>Debug correctness, precision, memory layout, and performance issues. \u00a0<\/li>\n<li>Work with compiler, runtime, hardware, and model teams.<\/li>\n<li>Document operator behavior, tensor layout, tiling strategy, and performance results.<\/li>\n<\/ul>\n<p><strong>Requirements<\/strong><\/p>\n<ul>\n<li>BS\/MS in CS, EE, Computer Engineering, or related field.<\/li>\n<li>Strong C\/C++ and Python programming skills.<\/li>\n<li>Basic understanding of tensor computation and neural network operators.<\/li>\n<li>Familiarity with basic computer architecture concepts such as memory hierarchy, bandwidth, latency, cache\/SRAM, and parallelism.<\/li>\n<li>Good debugging and problem-solving skills.<\/li>\n<\/ul>\n<p><strong>Preferred<\/strong><\/p>\n<ul>\n<li>Experience with any of the following:\n<ul>\n<li>CUDA, Triton, OpenCL, TVM, MLIR, Halide<\/li>\n<li>SIMD, DSP, embedded C\/C++, GPU, NPU, FPGA, or HPC programming<\/li>\n<li>compiler\/runtime development<\/li>\n<\/ul>\n<\/li>\n<li>Understanding of tiling, vectorization, memory access optimization, or mixed precision.<\/li>\n<li>Experience with FP32, FP16, BF16, INT8, or other numerical formats.<\/li>\n<\/ul>\n","protected":false},"featured_media":0,"template":"","meta":{"_ch_employer_id":"14029","_ch_external_id":"11157659","_ch_location_state":"","_ch_location_city":"","_ch_is_ocr_job":"","_ch_expiration_date":"2026-07-25","_ch_apply_link":"https:\/\/monmouth.joinhandshake.com\/jobs\/11157659\/share_preview"},"ch_stakeholder":[],"ch_class_year":[],"ch_job_category":[86],"ch_career_skill":[],"ch_industry":[65],"class_list":["post-20830","ch_job","type-ch_job","status-publish","hentry","ch_job_category-full-time","ch_industry-science-technology-innovation"],"_links":{"self":[{"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_job\/20830","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_job"}],"about":[{"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/types\/ch_job"}],"wp:attachment":[{"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/media?parent=20830"}],"wp:term":[{"taxonomy":"ch_stakeholder","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_stakeholder?post=20830"},{"taxonomy":"ch_class_year","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_class_year?post=20830"},{"taxonomy":"ch_job_category","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_job_category?post=20830"},{"taxonomy":"ch_career_skill","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_career_skill?post=20830"},{"taxonomy":"ch_industry","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_industry?post=20830"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}