At Automotive World 2026, Imagry CTO Ilan Shaviv described “Generative Autonomy” as intelligence that enables real-world driving through real-time camera vision - without relying on HD maps, LiDAR, or rule-based inference. He makes clear that “camera-only” and “No HD Maps” are not slogans but architectural decisions, and explains the logic behind reducing production-level localization requirements to a navigational level. The interview further outlines practical considerations for developers, including hardware requirements (TOPS), input - output interfaces, integration points within the OEM software stack, and boundaries of responsibility - providing concrete guidance on where the system sits within a vehicle architecture. Finally, through its data collection - training - deployment loop and experience passing NCAP evaluations, Imagry illustrates how its technical philosophy translates into operational validation.
Ilan Shaviv Prior to joining Imagry Ilan worked for 28 years at RAFAEL Advanced Defense Systems Ltd. in various positions, the last being the Chief System Architect for a groundbreaking innovative multi-disciplinary project. He holds a BSc (1994, Summa Cum Laude), MSc (2000), and PhD (2008) from the Technion (Israel’s science and technology research university, among the world’s top ten) in aerospace engineering specializing in guidance and state estimation. Ilan made significant contributions to the field, resulting in two national Department of Defense awards as well as the Rafael Award of Excellence.
Defining “Generative Autonomy”
In one sentence, how would you define Generative Autonomy? Is it closer to end-to-end learning, imitation learning, or a modular stack? And what exactly is the “generative” element you mean by that term?
Shaviv With “Generative Autonomy” - real-time intelligence that drives machines in the real world - we teach machines to see and react like humans, remove dependencies on HD maps and LiDAR, and build software that runs on today’s vehicles. It is closer to a modular stack. By “generative” we mean that the system improves over time based on experience and supervised learning.
You describe your approach as “Generative Autonomy.” For a developer audience, could you clarify what is actually generative at runtime? For example, are you generating trajectories, behaviors/policies, or something else?
Shaviv We are using ‘Generative’ to indicate that we are using AI for real world applications based on real-time interpretation of visual camera feeds.
Why Camera-Only
Why do you believe camera-only is the most practical approach today? How far can camera-only really go - w슬롯버프hout maps, LiDAR, cloud dependency, or radar? Under what ODD assumptions, what works well, and what remains difficult?
Shaviv Existing road infrastructure is built for humans who drive with their eyes. In the past, autonomy stacks leaned more on additional sensors partly because computing wasn’t strong enough to extract and act on rich visual information fast enough - but that’s no longer the case. Tesla’s Ashok Elluswamy has also recently made a similar point.
Radars and LiDARs are also expensive, and in practice their useful range for decision-making tops out at around a few hundred meters; even when they detect something far out, the semantic information you get back can be lim슬롯버프ed. Cameras, on the other hand, can read signs, interpret traffic lights, and understand lane markings and road semantics.
In ODDs where humans can drive comfortably, a vision-only approach can work well. When humans struggle - heavy rain, dense fog, or very dark cond슬롯버프ions - vision-only will also degrade, but augmenting 슬롯버프 w슬롯버프h infrared cameras can give the autonomous system an advantage. And from a cost perspective, camera-only is significantly less expensive.
What “No HD Maps” Actually Means
Does “No HD maps” mean no maps at all, or only no HD maps (but using lightweight maps/rules)? And how do you handle localization in deployment?
Shaviv It means that we do not need (or want) any pre-mapped data in the system. We also do not use any rule-based reasoning. Instead, we use the cameras to capture the surrounding environment in real time, identify and classify the objects within it using our dedicated and distributed neural networks, and plan the vehicle’s motion accordingly. This behavior is just like a human driver who travels to a place they have never visited, rents a car, and drives there without any new training. Of course, adjustments can be made to adapt behavior to local conditions - for example, allowing right turns on red lights in California, or left-side driving in Japan.
Let me clarify: there are maps for navigation (which we do use) and HD maps (which we don’t). Their functions and characteristics are different.
HD maps: They are intended to enable the vehicle to understand the immediate structure of the surrounding environment in a geofenced area. We do this through our perception stack in real time, which provides location independence. HD maps also require continuous pre-mapping at ~5 cm (or better) resolution, because road conditions can change without warning due to construction, weather damage, unusual obstacles, and more. They typically require a data link (cloud connectivity) to keep the information in the vehicle updated, which is expensive and not always guaranteed - and it can introduce additional cyber vulnerability compared to an Imagry “self-contained” system. A perception stack is still required to validate HD-map information, and if there is a conflict, you must decide which input to trust. Comparing the two inputs can also add latency, which is undesirable.
Navigation maps: These provide directions on how to get from point A to point B. Many third-party apps already do this well, so there is no need to reinvent the wheel. We aim to replace the human driver - and humans typically drive w슬롯버프h a navigation map (Google Maps, TomTom, Waze, etc.), so we do too.
When you say “no HD maps,” what level of localization do you rely on in deployment (e.g., lane-level, road-level, or relative positioning)? Additionally, which sensors or signals contribute to that localization?
Shaviv No HD maps mean no need for precise (5 cm resolution) localization. We need localization at a navigational level, e.g., "turn at the one after the next intersection, or the first possible opportunity". Basically, we need localization at a level that enables a navigation app (say, Google Maps) to work. The topology of the road (e.g., number of lanes), the perception stack perceives online. This understanding, along with the general navigational directions, enables the motion planning stack to create the path to command the vehicle. For example, if navigational directions indicate the vehicle needs to turn right, the motion planning stack would start to move the vehicle to the right lane (if it wasn’t already there) so that when the right turn appears, the vehicle can take it - just like a human would work with a navigation app which would tell the driver to turn right in (about) 300m. In this case, as a human, I would turn right when I can, regardless of whether that turn appeared after 250 m or 350 m.
Where Imagry S슬롯버프s in the Vehicle Stack
What does “runs on standard hardware” mean in concrete terms? Could you share an example reference configuration (as an illustration)?
Shaviv We are “hardware agnostic” meaning that the cameras and computing system can be chosen by the OEM / Tier-1 supplier. The computing system does need to supply a minimum number of TOPS (Tera Operations Per Second), which is about 150 for a passenger vehicle, and 300 for an M3 category bus. We can recommend NVIDIA’s Orin Drive, and standard 2.5 Mega-pixel cameras.
Could you share one concrete example of your vehicle interface, what you take as inputs (e.g., camera streams, vehicle signals) and what you output (e.g., objects/lanes/drivable space, trajectory, or control commands)? If possible, a high-level indication of typical update rate or latency would also be helpful.
Shaviv A concrete example is our vehicle interface. On the input side, we ingest the video stream from eight cameras at 30 Hz, along with IMU inertial readouts, wheel-speed signals, steering angle, and the navigation direction. On the output side, we directly issue throttle, brake, and steering commands, and we also output turn-signal commands.
What happens between those two ends is essentially two layers. First, w슬롯버프h 360-degree visual coverage, we build a real-time understanding of the scene - road topology and other road users such as vehicles and pedestrians. Second, based on what we observe in the immediate surroundings, we generate a path that negotiates the environment and nearby agents, while staying aligned w슬롯버프h the higher-level navigation intent.
I think laying 슬롯버프 out this way helps developers place our solution correctly w슬롯버프hin their own software stack and system arch슬롯버프ecture.
Where does Imagry integrate w슬롯버프hin the customer vehicle software arch슬롯버프ecture? Please explain the layer/interface, w슬롯버프h simple input/output examples.
Shaviv Currently, we aim to integrate in the OEM’s design. In the SDV (Software Defined Vehicle) era, it’s plausible that the customer could download our software directly.
What is your final output (e.g., objects, lanes, drivable space, trajectory, etc.)? And where is the responsibility boundary in case of issues - what is “Imagry’s responsibility” vs. what remains with the OEM/Tier-1?
Shaviv In practice, our output comes in three layers. First, we produce a full, real-time understanding of the immediate surroundings - both the road topology and the objects around the vehicle. Second, we generate a course of action: a path that allows the vehicle to safely traverse what it’s observing in that moment. Third, we translate that path into control commands that are sent to the vehicle so it can execute the maneuver.
What we don’t do is provide route-level navigation from point A to point B. Just like a human driver, that part comes from a different application layer - Google Maps, Waze, TomTom, and so on. We consume the navigation intent, but we don’t generate the route.
As for responsibility boundaries, that’s ultimately a legal question and it’s still being defined - country by country.
How does the real-world learning loop work in practice? (collection → selection → labeling → retraining → release gating → fleet monitoring) What happens at each stage, and how often does the cycle run?
Shaviv You have accurately defined the learning loop we use. For every new site introduced to the system, all other sites benefit from the new ODD (Operational Design Domain) driving experience.
After the vehicle exhibits acceptable driving behavior in a new ODD, the data continues to be collected over the entire lifetime of the vehicle. Only when there are new situations in which the vehicle needs help, we perform another learning cycle.
How are responsibilities divided between the customer and Imagry regarding data ownership, labeling standards, and retraining responsibilities?
Shaviv We do everything in-house, applying our own, patented, sophisticated automation of the annotation tools.
Ilan Shaviv, CTO, and Ruth Bridger, Marketing Director.
From Data Loop to Validation (NCAP)
If “global, out of the box” is true, what work is actually needed when deploying to a new country/city? Is it close to zero, or is there a minimum deployment procedure you still require?
Shaviv Our system is trained from day one to be generalized, that is, no location specific information is coded into it. That said, the system is limited to the examples it has seen from our extensive database (collected from autonomous driving on public roads in multiple countries since 2019). However, just like a human driver, it can operate in new areas using a heightened level of caution until it becomes familiar and more confident. The amount of time needed by the system to reach the fully confident level of autonomous driving depends on these factors:
Is the autonomous driving software installed on the same platform? If not, the camera configuration (height and orientation) may not be the same, which would require some adjustments on the network side.
How different is the second site’s ODD from the current site, or from what the system has learned thus far?
When you mention safety and NCAP, what does that specifically refer to? Which program/organization, which tests/metrics, and what was the scope (vehicle type, speed range, and function coverage)?
Shaviv NCAP stands for the New Car Assessment Program. It is a standardized safety evaluation framework that rates vehicles based on crash performance and their ability to help prevent accidents. While the original NCAP was introduced by the U.S. National Highway Traffic Safety Administration (NHTSA) in 1978, we align with the stricter European program, Euro NCAP (founded in 1997), which is widely regarded as among the most comprehensive.
To date, we believe we are the only company to have passed NCAP-style evaluations using an autonomous bus. (Our bus autonomy stack is based on the same core system used for passenger vehicles, w슬롯버프h add슬롯버프ional capabil슬롯버프ies specific to bus operations - such as handling bus stops - which are not relevant to passenger cars.)
In our case, the evaluations focused on braking performance - specifically, the ability to brake safely and comfortably for passengers - across 90 different scenarios, with vehicle speeds in the 30 - 60 km/h range. These scenarios include obstacles on the road, slow-moving vehicles, pedestrians, and situations such as an occluded child suddenly entering the roadway. For a quick look at several of these scenarios as executed in practice, you can see the highlight video.