{"id":20831,"date":"2026-06-24T22:49:32","date_gmt":"2026-06-24T22:49:32","guid":{"rendered":"https:\/\/www.monmouth.edu\/career-development\/job\/autonomous-driving-multimodal-model-algorithm-engineer\/"},"modified":"2026-06-25T12:00:45","modified_gmt":"2026-06-25T16:00:45","slug":"autonomous-driving-multimodal-model-algorithm-engineer","status":"publish","type":"ch_job","link":"https:\/\/www.monmouth.edu\/career-development\/job\/autonomous-driving-multimodal-model-algorithm-engineer\/","title":{"rendered":"Autonomous Driving Multimodal Model Algorithm Engineer"},"content":{"rendered":"<p>Black Sesame Technologies is building high-performance AI algorithms and self-developed chips for intelligent driving and beyond. As an <strong>Autonomous Driving Multimodal Model Algorithm Engineer<\/strong>, you will work on next-generation multimodal AI models for autonomous driving, including Vision-Language Models, Vision-Language-Action Models, and World Models.<\/p>\n<p>You will collaborate with perception, prediction, planning, data, simulation, and deployment teams to integrate multimodal models with existing BEV perception, two-stage E2E, and one-stage E2E autonomous driving systems.<\/p>\n<p>We are looking for candidates with hands-on experience in <strong>one or more<\/strong> of the following areas: Vision-Language Models, Vision-Language-Action Models, World Models.<\/p>\n<p><strong>Responsibilities<\/strong><\/p>\n<p><strong>Multimodal Model Development for Autonomous Driving<\/strong><\/p>\n<ul>\n<li>Work on one or more multimodal modeling directions for autonomous driving, including VLM-based scene understanding, VLA-style planning-oriented modeling, and World Model-based future prediction.\u00a0<\/li>\n<li>Develop and optimize models that reason over multi-camera images, BEV features, map elements, object\/lane instances, occupancy, trajectories, ego-motion, and driving context.\u00a0<\/li>\n<li>Explore model architectures that connect perception, prediction, planning, and decision-making in two-stage and one-stage E2E autonomous driving systems. \u00a0<\/li>\n<li>Collaborate with BEV perception and planning teams to improve representation quality, temporal consistency, long-tail robustness, and planning relevance.\u00a0<\/li>\n<\/ul>\n<p><strong>Vision-Language and Vision-Language-Action Modeling<\/strong><\/p>\n<ul>\n<li>Develop VLM-based methods for driving scene understanding, open-vocabulary perception, risk reasoning, corner-case analysis, and interpretable autonomy.\u00a0<\/li>\n<li>Adapt and extend open-source multimodal architectures such as LLaVA, Qwen-VL, InternVL, MiniCPM-V, OpenVLA, or similar models for autonomous driving scenarios.\u00a0<\/li>\n<li>Research VLA-style models that map multimodal driving context, navigation intent, and high-level instructions to trajectories, actions, or planning representations.\u00a0<\/li>\n<li>Align visual, BEV, map, object, lane, occupancy, trajectory, and language representations for driving-specific tasks.\u00a0<\/li>\n<li>Build supervised fine-tuning, instruction-tuning, and efficient adaptation pipelines for driving-relevant multimodal tasks.\u00a0<\/li>\n<\/ul>\n<p><strong>World Model and Future Prediction<\/strong><\/p>\n<ul>\n<li>Build world-model-based approaches for future BEV, occupancy, object motion, lane evolution, traffic interaction, and ego-conditioned scene rollout.\u00a0<\/li>\n<li>Explore generative and predictive modeling methods such as diffusion models, autoregressive transformers, latent dynamics models, video prediction, and BEV prediction.\u00a0<\/li>\n<li>Use learned world models for scenario generation, counterfactual reasoning, long-tail case mining, planning evaluation, and closed-loop analysis.\u00a0<\/li>\n<li>Work with simulation and data teams to improve safety-critical scenario discovery and model-based evaluation.\u00a0<\/li>\n<\/ul>\n<p><strong>Efficient Adaptation and Deployment<\/strong><\/p>\n<ul>\n<li>Apply efficient fine-tuning and adaptation methods such as LoRA, QLoRA, Adapter, Prompt Tuning, Prefix Tuning, or other PEFT techniques.\u00a0<\/li>\n<li>Develop multimodal feature alignment modules, including projection heads, query adapters, cross-attention modules, tokenization strategies, and representation converters.\u00a0<\/li>\n<li>Optimize model architecture, latency, memory footprint, and compute cost for automotive deployment.\u00a0<\/li>\n<li>Apply distillation, quantization, pruning, sparse computation, and efficient attention methods where appropriate.\u00a0<\/li>\n<li>Collaborate with chip, compiler, runtime, and deployment teams to adapt multimodal models to in-house automotive AI hardware.\u00a0<\/li>\n<\/ul>\n<p><strong>Research, Evaluation, and Iteration<\/strong><\/p>\n<ul>\n<li>Track the latest research in VLM, VLA, World Models, BEV perception, E2E driving, robotics foundation models, generative simulation, and multimodal learning.\u00a0<\/li>\n<li>Design evaluation metrics for reasoning quality, grounding accuracy, temporal consistency, prediction quality, planning relevance, and safety-critical scenarios.\u00a0<\/li>\n<li>Perform systematic failure analysis and drive data\/model iteration based on real-world autonomous driving cases.\u00a0<\/li>\n<li>Contribute to patents, technical reports, internal research platforms, and conference or journal publications.\u00a0<\/li>\n<\/ul>\n<p><strong>Qualifications<\/strong><\/p>\n<ul>\n<li>MS or PhD in Computer Science, Electrical Engineering, Robotics, Artificial Intelligence, or a related field.\u00a0<\/li>\n<li>Strong background in deep learning, computer vision, multimodal learning, robotics, or autonomous driving.\u00a0<\/li>\n<li>Hands-on experience in <strong>one or more<\/strong> of the following areas:\u00a0\n<ul>\n<li>Vision-Language Models, multimodal large models, or open-source VLM adaptation\u00a0<\/li>\n<li>Vision-Language-Action models, robotics foundation models, or action-conditioned modeling\u00a0<\/li>\n<li>World models, generative prediction, latent dynamics modeling, or future scene simulation\u00a0<\/li>\n<li>BEV perception, multi-view 3D perception, or end-to-end autonomous driving\u00a0<\/li>\n<li>Motion prediction, planning, trajectory generation, or closed-loop evaluation\u00a0<\/li>\n<\/ul>\n<\/li>\n<li>Practical experience with open-source multimodal architectures such as LLaVA, Qwen-VL, InternVL, MiniCPM-V, OpenVLA, BLIP-style models, Flamingo-style models, or similar systems.\u00a0<\/li>\n<li>Solid understanding of multimodal feature alignment, including vision-language alignment, cross-modal attention, visual tokenization, projection layers, query-based fusion, or embedding-space alignment.\u00a0<\/li>\n<li>Experience with efficient fine-tuning or adaptation methods, such as LoRA, QLoRA, Adapter, Prompt Tuning, Prefix Tuning, supervised fine-tuning, or instruction tuning.\u00a0<\/li>\n<li>Proficient in PyTorch and capable of modifying, training, debugging, and evaluating deep learning models.\u00a0<\/li>\n<li>Familiar with transformer architectures, attention mechanisms, temporal modeling, and large-scale training.\u00a0<\/li>\n<li>Experience with multimodal data, such as camera, radar, LiDAR, IMU, map, trajectory, language, or structured driving data.\u00a0<\/li>\n<li>Strong engineering ability in Python; C++\/CUDA\/TensorRT experience is a plus.\u00a0<\/li>\n<li>Comfortable with Git, Docker, Linux, distributed training, and collaborative development workflows.\u00a0<\/li>\n<li>Strong communication skills and ability to work across perception, planning, data, simulation, and deployment teams.\u00a0<\/li>\n<\/ul>\n<p><strong>Preferred Qualifications<\/strong><\/p>\n<ul>\n<li>Experience adapting or fine-tuning VLM\/VLA models such as LLaVA, Qwen-VL, InternVL, MiniCPM-V, OpenVLA, or similar architectures.\u00a0<\/li>\n<li>Experience with Hugging Face Transformers, PEFT, DeepSpeed, FSDP, vLLM, SGLang, TensorRT-LLM, or similar training\/inference frameworks.\u00a0<\/li>\n<li>Experience building multimodal instruction datasets, driving-scene QA datasets, grounding datasets, scene-reasoning datasets, or planner-oriented supervision signals.\u00a0<\/li>\n<li>Experience aligning multimodal model representations with BEV features, object queries, lane instances, occupancy grids, map vectors, trajectories, or planner inputs.\u00a0<\/li>\n<li>Experience with autonomous driving architectures such as BEVFormer, DETR\/DINO, MapTR\/MapQR, occupancy networks, diffusion planners, trajectory transformers, or similar models.\u00a0<\/li>\n<li>Experience with world models, generative models, video prediction, future BEV prediction, occupancy forecasting, learned simulation, or closed-loop evaluation.\u00a0<\/li>\n<li>Experience with efficient adaptation of large models, including LoRA\/QLoRA, distillation, quantization, pruning, sparse attention, or lightweight adapter design.\u00a0<\/li>\n<li>Experience deploying deep learning models on automotive SoCs, ASICs, GPUs, or edge AI accelerators.\u00a0<\/li>\n<li>Publications or strong project experience in CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, CoRL, ICRA, IROS, RSS, or related autonomous driving and robotics venues.\u00a0<\/li>\n<li>Strong ability to convert research ideas into robust production systems.<\/li>\n<li>Experience with AI agent tools and basic harness engineering, including building evaluation scripts, task runners, automated workflows, tool-use pipelines, and reproducible testing environments for model or agent development.<\/li>\n<\/ul>\n<p>\u00a0<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"_ch_employer_id":"14029","_ch_external_id":"11157633","_ch_location_state":"","_ch_location_city":"","_ch_is_ocr_job":"","_ch_expiration_date":"2026-07-25","_ch_apply_link":"https:\/\/monmouth.joinhandshake.com\/jobs\/11157633\/share_preview"},"ch_stakeholder":[],"ch_class_year":[],"ch_job_category":[86],"ch_career_skill":[],"ch_industry":[65],"class_list":["post-20831","ch_job","type-ch_job","status-publish","hentry","ch_job_category-full-time","ch_industry-science-technology-innovation"],"_links":{"self":[{"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_job\/20831","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_job"}],"about":[{"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/types\/ch_job"}],"wp:attachment":[{"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/media?parent=20831"}],"wp:term":[{"taxonomy":"ch_stakeholder","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_stakeholder?post=20831"},{"taxonomy":"ch_class_year","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_class_year?post=20831"},{"taxonomy":"ch_job_category","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_job_category?post=20831"},{"taxonomy":"ch_career_skill","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_career_skill?post=20831"},{"taxonomy":"ch_industry","embeddable":true,"href":"https:\/\/www.monmouth.edu\/career-development\/wp-json\/wp\/v2\/ch_industry?post=20831"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}