The Byzantine Generals Problem and AI Autonomous Cars

The Byzantine Generals problem, a condition of a computer system where imperfect information exists about whether a component has failed, can be taught with a puzzle exercise.

By Lance Eliot, the AI Trends Insider

[Ed. Note: For reader’s interested in Dr. Eliot’s ongoing business analyses about the advent of self-driving cars, see his online Forbes column:]

Let’s examine the topic of things that work only intermittently, which as you’ll soon see is a crucial topic for intelligently designing and building AI systems, especially for self-driving autonomous cars.

First, a story to illuminate the matter.

My flashlight was only working intermittently, so I shook it to get the bulb to shine, hoping to cast some steady light. One moment the flashlight had a nice strong beam and the next moment it was faded and not of much use. At times, the light emanating from the flashlight would go on-and-off or it would dip so close to being off that I would shake it vigorously and generally the light would momentarily revive.

We were hiking in the mountains as part of our Boy Scout troop’s wilderness-survival preparations and I was an adult Scoutmaster helping to make sure that none of the Scouts got hurt during the exercise. At this juncture, it was nearly midnight and the moon was providing just enough natural light that the Scouts could somewhat see the trail we were on. We had been instructed to not use flashlights since the purpose of this effort was to gauge readiness for surviving in the forests without having much in-hand other than the clothes on your back.

There were some parts of the trail that meandered rather close to a sheer cliff and I figured that adding some artificial light to the situation would be beneficial.

Yes, I was tending to violate the instructions about not using a flashlight, but I was also trying to abide by the even more important principle to make sure that none of the Scouts got injured or perished during this exercise.

Turns out that I had taken along an older flashlight that was at the bottom of my backpack and mainly there for emergency situations.

At camp, I had plenty of newer flashlights and had brought tons of batteries as part of my preparation for this trip.

While watching the Scouts as they trudged along the trail, I was mentally trying to figure out what might be wrong with the flashlight that I had been trying to use periodically during this hike.

Could it be that the batteries were running low?

If so, there wasn’t much that I could do about it now that I was out on the trek.

Or, it could be that the bulb and the internal flashlight mechanism were loose and at times disconnected, thus it would shift around as I was hiking on this rather bumpy and rutty trail. If a loose wire or connection was the problem, I could likely fix the flashlight right away, perhaps even doing so as we were in the midst of hiking.

Turns out we soon finished the hike and reached camp, and I opted to quickly replace the flashlight with one of my newer ones that worked like a charm.

Problem solved.

I’m guessing that you’ve probably had a similar circumstance with a flashlight, wherein sometimes it wants to work and sometimes not.

Of course, this kind of intermittent performance is not confined to flashlights.

Various mechanical contraptions can haunt us with intermittent performance, whether it might be a flashlight, a washing machine, a hair dryer, etc.

When I was younger, I had a rather beat-up older car that was in a bit of disrepair and it seemed to have one aspect or another that would go wrong without any provocation. One moment, the engine would start and it ran fine, while at other times the car refused to start and once underway might suddenly conk out. I had taken the ill-behaving beast to a car mechanic and his advice was simple, get rid of the old car and get a new one. That didn’t help my situation since at the time I could not afford a new car and was trying to do what I could to keep my existing car running as best as practical (with spit and bailing wire, as they say).

Let’s shift now to another topic, which you’ll see relates to this notion of things that work intermittently.

Valuable Lesson For Students

When I was a university professor, I used to have my computer science students undertake an in-class exercise that was surprising to them and unexpected for a computer science course.

Indeed, when I first tried the exercise, students would balk and complain that it was taking time away from focusing on learning about software development. I assured them that they would realize the value of the short effort if they just gave it a chance.

I’m happy to say that not only did the students later on indicate it was worth the half hour or so of class time, it became one of the more memorable exercises out of many of the computer science classes that they were required to take.

I would have each student pair-up with another student. It could be a fellow student that they already knew, or someone that they had not yet met in the class. Each pair would sit in a chair, facing each other across a table, and there was a barrier placed on the table that prevented them from seeing the surface of the table on the other side of the barrier.

I would give to each person a jigsaw puzzle containing about a dozen pieces.

The pieces of the jigsaw puzzle were rather irregular in shape and size. The pieces also had various colors. Thus, one piece of the jigsaw puzzle might be blue and a shape that was oval, while another piece might be red and was the shape of a rectangle.

One member of each respective pair would get the jigsaw puzzle already assembled and thus “solved” in terms of being put together.

The other person of the pair would get the jigsaw pieces in a bag and they were all loose and not assembled in any manner. The person with the assembled jigsaw puzzle was supposed to verbally instruct the other member of their pair to take out the pieces from the bag and assemble the puzzle. The rules included that you could not hold-up the pieces and show them to the other person, which prevented, for example the person with the fully assembled jigsaw puzzle of just holding it up for the other person to see it.

You were now faced with a situation of via verbal indications only, trying to get the other person to assemble the jigsaw puzzle.

This kind of exercise is often done in business schools and it is intended to highlight the nature of communications, social interaction, human behavior, and problem solving. There are plenty of lessons to be had, even though it is a rather quick and easy exercise. For anyone that hasn’t done this before, it usually makes a strong impression on them.

Since the computer science students were rather overconfident and felt that they could readily do such a simple exercise, they launched into the effort right away.

No need to first consider how to solve this problem and instead just start talking.

That’s what I anticipated they might do (I had purposely refrained from offering any tips or suggestions on how to proceed), and they fell right into the trap.

A student in a given pair might say to the other one, find the red piece and put it up toward the right as it will be the upper right corner piece for the puzzle. Now, take the blue piece and put it toward the lower left as it will be the lower left corner piece. And so on. This would be similar to solving any kind of jigsaw puzzle, often starting by finding the edges and putting those into place, and then once the edge or outline is completed you might work towards the middle area assembly.

This puzzle solving approach normally would be just fine. There was a twist or trick involved in this puzzle. The assembled puzzle was not necessarily the same as the disassembled one that the other person had. In some cases, the shape of the pieces was the same and went into the same positions of the assembled puzzle, but the colors the pieces were different. Or, the shapes were different, and the colors were the same.

Here’s what would happen when the students launched into the matter.

The person with the assembled puzzle would tell the other one to put the red piece in the upper right, but it turns out that the other person’s red piece didn’t go there and was intended to go say in the lower left corner position. Since neither of the pair could see the other person’s pieces, they would not have any immediate way of realizing that they were each playing with slightly different puzzle pieces.

When the person trying to assemble their puzzle was unable to do so, it would lead to frustration for them and the person trying to instruct them, each becoming quite exasperated. Banter was quite acrid at times. I told you to put the red piece in the upper right corner, didn’t you do as I told you? Yes, I put it there, but things aren’t working out and it doesn’t seem to go there. Well, I’m telling you that the red piece must go there. Etc.

Given that many of the computer science students were perfectionists, it made things even more frustrating for them and they were convinced that the other person was a complete dolt. The person with the disassembled pieces was sure that the other person was an idiot and could not properly explain how the puzzle was assembled. The person with the assembled pieces was sure that the other person trying to assemble the puzzle was refusing to follow instructions and was being obstinate and a jerk.

When I revealed the matter by lifting the barrier, there were some students that said the exercise was unfair. They had assumed that each of them was getting the same puzzle as the person on the other side of the barrier (I never stated this to be the case, though it certainly would seem “logical” to assume it). It was unfair, they loudly crowed, and insisted that the exercise was senseless and quite upsetting.

I asked how many of the students took the time at the start to walk through the nature of the pieces that they had in-hand.

None did so.

They all just jumped right away into trying to “solve the problem” of assembling the pieces. I pointed out that if they had begun by inspecting the pieces and talking with each other about what they had, it would have likely been a faster and more likely path of “solving the problem” than by just skipping straight into it.

What was interesting is that some of the students at times were sure during the exercise that the other student was purposely trying to be difficult and possibly even lying. If you told the other student to put the red piece in the upper right corner, you had no way to know for sure that they did so. They might say they did, but you couldn’t see it with your own eyes. As such, when the puzzle pieces weren’t fitting together, the person giving instructions began to suspect that the other person was lying about what they were doing.

There were some students that even thought that perhaps I had arranged with the member of the pair to intentionally lie during the exercise. It was as though I had somehow before class started been able to reach half of the class secretly and tell them to make things difficult during the puzzle exercise and lie to the other person. Amazingly, even pairs of friends thought the same thing. Quite a conspiracy theory!

For more about conspiracies, see my article:

Introducing The Byzantine Generals Problem

Tying this tale of the puzzle solving to my earlier story about the intermittent flashlight, the crux is that you might find yourself sometimes immersed into a system that has aspects that are not working as you imagined they would.

Is this because those other elements are purposefully doing so, or is it by happenstance?

In whatever manner it is occurring, what can you do to rectify the situation? Are you even ready in case a system that you are immersed into might suddenly begin to have such difficulties?

Welcome to the Byzantine Generals Problem.

First introduced in a 1982 paper that appeared in the Transactions on Programming Languages and Systems published by the esteemed Association for Computing Machinery (ACM), the article was aptly entitled “The Byzantine Generals Problem” — there are numerous variants of the now-classic problem and what to do about it.

It is a commonly described and taught problem in computer science classes and covers an important topic that anyone involved especially in real-time systems development should be aware of.

It has to do with fault-tolerance.

You might have a system that contains elements or sub-components that might at one point or another suffer a fault.

Upon having a fault, the element or sub-component might not make life so easy that the element or sub-component just outright fails and stops.

In a sense, if an element stops completely from working, you are in an “easier” diagnostic situation in that you can perhaps declare that element or sub-component “dead” and no longer usable, versus the more tortuous route of having an element or sub-component that kind of works but not entirely so.

What can be particularly trying is a situation of an element or sub-component that intermittently works.

In that case, you need to figure out how to handle something that might or might not work when you need it to work. If my flashlight had not worked at all, I would have assumed that the batteries were dead or that the wiring was bad, and I would not have toyed with the flashlight at all, figuring it was beyond hope. But, since the flashlight was nearly working, I was hopeful of trying to deal with the faults and see what could be done.

You could of course declare outright that any element or sub-component that falters is considered “dead” and therefore you will henceforth pretend that it is. In the case of my flashlight, due to the intermittent nature of it, I might have just put it back into my backpack and decide that it was not worth playing around with it. Sure, it did still kind of work, but I might decide to declare it dead and finished.

The downside there is that I’ve given up on something that still has some life to it, and therefore some practical value. Plus, there is an outside chance that it might opt to start working correctly, doing so all of the time and no longer be intermittent. And, there’s a chance that I might be able to play with it and get it to work properly, even if it won’t happen to do so of its own accord.

For the flashlight, I wasn’t sure what was the source of the underlying problem. Was it the batteries? Was it the wiring? Was it the bulb? This can be another difficulty associated with faults in a system. You might not know or readily be able to discern where the fault exists. You might know that overall the system isn’t working as intended, but the specific element or sub-component that is causing the trouble might be hidden or buried within lots of other elements or sub-components.

With my fussy car that I had when I was younger, if the engine wouldn’t start, I had no ready means of knowing where the fault was. I took in the car and the mechanic changed the starter. This seemed to help and the car ran for about a week. It then refused to start again. I took the car to the auto mechanic a second time and this time he changed the spark plugs. This helped for a few days. Unfortunately, infuriatingly, it stopped running again. Inch by inch, I was being tortured by elements of the car that would experience a fault (in this case, fatal faults rather than intermittent ones).

Intermingling Of Faults

One fault can at times intermingle with another fault.

This makes things doubly challenging.

It’s usually easiest (and often naive) to think that you can find “the one” element that is causing the difficulty and then deal with that element only.

In real life, it is often the case that you end-up with several elements or sub-components at fault. If the starter for my car is intermittently working, and also if my spark plugs are intermittently working, it can be dizzying and maddening since they might function or not function in a wide variety of combinatorial circumstances. Just when I think the problem is the starter, it works fine, and yet maybe then the car still won’t start. When I then think that the problem is the spark plugs, maybe it starts but then later it doesn’t due to the bad starter.

You can have what are considered “error avalanches” that cascade through a system and are due to one or more elements or sub-components that are suffering faults.

Remember too that a fault does not imply that the element or sub-component won’t work at all.

The faulty element can do its function in a half-baked way. If the batteries in the flashlight were low on energy, they were perhaps only able to provide enough of a charge to light the flashlight part of the time. They apparently weren’t completely depleted of their charge, since the flashlight was at least still partially able to light-up.

A fault can be even more inadvertently devious in that it might not function in a half-baked way and instead provide false or misleading aspects, not necessarily because it is purposely trying to lie to you. The students that were telling each other which piece to use for the puzzle were genuinely trying to express what to do. None of them were purposely trying to lie and get the other person confused. They were each being truthful as best they presumed in the circumstance.

I am not ruling out that an element or sub-component might intentionally lie, and merely emphasizing that the fault in an element or sub-component can cause it to lie, and this might be “lying” without such intent or indeed it might be that the element or sub-component is purposely lying.

A student in the puzzle exercise could have intentionally chosen to lie, which maybe the person might do to get the other person upset.

I’ve known some professors that tried this tactic as a variant to the puzzle solving problem, infusing the added complication of truth or lies detection into it. The professor would offer points to the students to purposely distort or “lie” about the puzzle assembly, and the other student needed to try and ferret out the truth versus what was a lie (you’ve perhaps seen something similar to Jimmy Fallon’s popular skits involving lying versus telling the truth with celebrities that he has on his nighttime talk show).

About Byzantine Fault Tolerance

Byzantine Fault Tolerance (BFT) is the notion that you need to design a system to be able to contend with so-called Byzantine faults, which consists of faults that might or might not involve an element or sub-component entirely going dead (known as fail-stop), and for which the fault could allow that element or sub-component to still function but in a half-baked way, or it might do worse and actually “lie” or distort whatever it is supposed to do.

And, this can occur to any of the elements or sub-components, at any time, and intermittently, and can occur to only one element or sub-component at a time or might encompass multiple elements or sub-components that are each twinkling as to properly functioning.

Why is this known as the Byzantine Generals Problem?

In the original 1982 setup of this intriguing “thought experiment” problem, the researchers proposed that you might have military generals in the Byzantine army that are trying to take a city or fort. Suppose that the generals will need to coordinate their attack and will be coming at the city or fort from different angles. The timing of the attack has to be done just right. They need to attack at the same time to effectively win the battle.

We’ll pretend that the generals can only communicate a simplistic message that says either to attack or to retreat. If you were a general, you would wait to see what the other generals have to say. If they are saying to attack, you would presumably attack too. If they say to retreat, you would presumably retreat too. The generals are not able to directly communicate with each other (because they didn’t have cell phones in those days, ha!), and instead they use their respective lieutenants to pass messages among the generals.

You can likely guess that the generals are our elements or sub-components of a system, and we can consider the lieutenants to be elements too, though one way to treat the lieutenants in this allegory is as messengers rather than purely as traditional elements of the system. I don’t want to make this too messy and long here, so I’ll keep things simpler. One aspect though to keep in mind is that a fault might occur not just in the functional items of interest, but it might also occur in the communicating of their efforts. The starter in my car might work perfectly and it is only the wire that connects it to the rest of the engine that has the fault (it’s the messenger that is at fault). That kind of thing.

Suppose that one or more of the generals is a traitor. To undermine the attack, the traitorous general(s) might send an attack message to some of the generals and simultaneously send a retreat message to others. This could then induce some of the generals into attacking and yet they might not be sufficient in numbers to win and take the city or fort. Those generals attacking might get wiped out. The loyal generals would be considered non-faulty, and the traitorous generals would be considered “faults” in terms of how they are functioning.

There are all kinds of proposed solutions to dealing with the Byzantine Generals Problem.

You can mathematically describe the situation and then try to show a mathematical solution, along with providing handy rules-of-thumb about it. For example, depending upon how you describe and restrict the nature of the problem, you could say that in certain situations as long as only a third or less of the participants are traitors you can provide a method to deal with the traitorous acts (this comes from a mathematical formulation of n > 3t, wherein t is the number of traitors and n is the number of generals).

I use the Byzantine Generals Problem to bring up the broader notion of Byzantine Fault Tolerance, namely that anyone involved in the design and development of a real-time system needs to be planning for the emergence of faults within the real-time system, beyond just assuming they will encounter “dead” or fail-stop faults, and must design and develop the real-time system to cope with faults of an intermittent nature and faults that can at times tell the truth or lie.

Byzantine Generals Problem And AI Autonomous Cars

What does this have to do with AI self-driving autonomous cars?

At the Cybernetic AI Self-Driving Car Institute, we are developing AI software for self-driving cars. The AI for a self-driving car is a real-time system and has hundreds upon hundreds if not thousands of elements or sub-components. Some estimates suggest that the software for a self-driving car might amount to well over 250 million lines of code (though lines of code is a problematic metric).

The auto makers and tech firms crafting such complex real-time systems need to make sure they are properly taking into account the nature of Byzantine Fault Tolerance.

Bluntly, an AI self-driving car is a real-time system that involves life-or-death matters and must be able to contend with faults of a wide variety and that can happen at the worst of times. Keep in mind that an AI self-driving car could ram into a wall or crash into another car, any of which might happen because the AI system itself suffered an internal fault and the fault-tolerance was insufficient to safely keep the self-driving car from getting into a wreck.

For my article about the safety aspects of AI self-driving cars, see:

For another article of mine covering AI safety aspects, see:

I’d like to clarify and introduce the notion that there are varying levels of AI self-driving cars. The topmost level is considered Level 5. A Level 5 self-driving car is one that is being driven by the AI and there is no human driver involved. For the design of Level 5 self-driving cars, the automakers are even removing the gas pedal, the brake pedal, and steering wheel, since those are contraptions used by human drivers. The Level 5 self-driving car is not being driven by a human and nor is there an expectation that a human driver will be present in the self-driving car. It’s all on the shoulders of the AI to drive the car.

For self-driving cars less than a Level 5, there must be a human driver present in the car. The human driver is currently considered the responsible party for the acts of the car. The AI and the human driver are co-sharing the driving task. In spite of this co-sharing, the human is supposed to remain fully immersed into the driving task and be ready at all times to perform the driving task. I’ve repeatedly warned about the dangers of this co-sharing arrangement and predicted it will produce many untoward results.

For my overall framework about AI self-driving cars, see my article:

For the levels of self-driving cars, see my article:

For why AI Level 5 self-driving cars are like a moonshot, see my article:

For the dangers of co-sharing the driving task, see my article:

Let’s focus herein on the true Level 5 self-driving car. Much of the comments apply to the less than Level 5 self-driving cars too, but the fully autonomous AI self-driving car will receive the most attention in this discussion.

Here’s the usual steps involved in the AI driving task:

  • Sensor data collection and interpretation
  • Sensor fusion
  • Virtual world model updating
  • AI action planning
  • Car controls command issuance

Another key aspect of AI self-driving cars is that they will be driving on our roadways in the midst of human driven cars too. There are some pundits of AI self-driving cars that continually refer to a utopian world in which there are only AI self-driving cars on public roads. Currently there are about 250+ million conventional cars in the United States alone, and those cars are not going to magically disappear or become true Level 5 AI self-driving cars overnight.

Indeed, the use of human driven cars will last for many years, likely many decades, and the advent of AI self-driving cars will occur while there are still human driven cars on the roads. This is a crucial point since this means that the AI of self-driving cars needs to be able to contend with not just other AI self-driving cars, but also contend with human driven cars. It is easy to envision a simplistic and rather unrealistic world in which all AI self-driving cars are politely interacting with each other and being civil about roadway interactions. That’s not what is going to be happening for the foreseeable future. AI self-driving cars and human driven cars will need to be able to cope with each other.

For my article about the grand convergence that has led us to this moment in time, see:

See my article about the ethical dilemmas facing AI self-driving cars:

For potential regulations about AI self-driving cars, see my article:

For my predictions about AI self-driving cars for the 2020s, 2030s, and 2040s, see my article:

Dealing With Faults In AI Autonomous Car Systems

Returning to the Byzantine Fault Tolerance matter, let’s consider the various aspects of an AI self-driving car and how it needs to be designed and developed to contend with a myriad of potential faults.

Let’s start with the sensors.

An AI self-driving car has numerous sensors, including cameras, radar, ultrasonic, LIDAR, and other sensory devices. Any of those sensors can experience a fault. The fault might involve the sensor going “dead” and into a fault-stop state. Or, the fault might cause the sensor to report only partial data or only a partial interpretation of the data collected by the sensor. Worse still, the fault could encompass that the sensor is “lying” about the data or its interpretation of the data.

When I use the word “lying” it is not intended herein to imply necessarily that someone has been traitorous and gotten the sensor to purposely lie about what data it has or the interpretation of the data. I’m herein instead suggesting that the sensor might provide false data that doesn’t exist, or provide real data that has been changed to falsely represent the original data, or provided an interpretation of the data that maybe originally would have said one thing but instead gave something completely contrary. This could occur by happenstance due to the nature of the fault.

Those could also of course be purposeful and intentional “lies” in that suppose a nefarious person has hacked into the AI self-driving car and forced the sensors to internally tell falsehoods.

Or, maybe the bad-hat hacker has planted a computer virus that causes the sensors to tell falsehoods. The virus might not even be forcing the sensors to do so and instead be working as a man-in-the-middle attack that takes whatever the sensors report, blocks the messages, substitutes its own messages of a contrary nature, and sends them along. It could be that the AI self-driving car has been attacked by an outsider, or it could be that even an insider that aided the development of the AI self-driving car had implanted a virus that would at some future time become engaged.

For more about the computer security aspects, see my article:

For the need to have resiliency, see my article:

For the potential of cryptojacking, see my article:

Overall, the AI needs to protect itself from itself.

The AI developers should have considered beforehand the potential for faults occurring with the various elements and sub-components of the AI system. There should have been numerous checks-and-balances included within the AI system to try and detect the faults. Besides detecting the faults, there needs to be systematic ways in which the faults are then dealt with.

In the case of the sensors, pretend that one of the cameras is experiencing a fault. The camera is still partially functioning. It is not entirely “dead” or at a fail-stop status. The images are filled with noise and it makes the images occluded or confused looking. The internal system software that deals with this particular camera does not realize that the camera is having troubles. The troubles come and go, meaning that at one moment the camera is providing pristine and accurate images, while the next moment it does not.

We’ve previously let’s say put in place a Machine Learning (ML) component that has been trained to be able to detect pedestrians. After having scrutinized thousands and thousands of street scenes with pedestrians, the Machine Learning algorithm using an Artificial Neural Network has gotten pretty good at picking out the shape of a pedestrian in even crowded street scenes. It does so with a rather high reliability.

The Machine Learning component gets fed a lousy camera image that has been populated with lots of static and noise, due to the subtle fault in the camera. This has made the portion that has a pedestrian in it very hazy and fuzzy, and the ML is unable to detect a pedestrian to any significant probability. The ML reports this to the sensor fusion portion of the AI system.

We now have a situation wherein a pedestrian exists in the street ahead of us, but the interpretation of the camera scene has indicated that there is not a pedestrian there. Is the Machine Learning component lying? In this case, it has done its genuine job and concluded that there is not a pedestrian there. I suppose we would say it is not lying per se. If it had been implanted with a computer virus that caused it to intentionally ignore the presence of a pedestrian and misreport as such to the rest of the AI, we might then consider that to be a lie.

For more about Machine Learning and AI self-driving cars, see my article:

For Federated Machine Learning, see my article:

For Ensemble Machine Learning, see my article:

For my article about the detection of pedestrians, see:

One should be asking why the system element that drives the camera has not yet detected that the camera has a fault? Furthermore, we might expect the ML element to be suspicious of images that have static and noise, though of course that could be happening a lot of the time in a more natural manner that has nothing to do with faults. Presumably, once the interpretation reaches the sensor fusion portion of the AI system, the sensor fusion will try to triangulate the accuracy and “honesty” of the interpretation by comparing to the other sensors, including other cameras, radar, LIDAR, and the like.

You could liken the various sensors to the generals in the Byzantine Generals Problem. The sensor fusion must try to ferret out which of the generals (the sensors) are being truthful and which are not, though it is not quite so straightforward as a simple attack versus retreat kind of message. Instead, the matter is much more complex involving where objects are in the surrounding area and whether those objects are near to the AI self-driving car, or whether they pose a threat to the self-driving car, or whether the self-driving car poses a threat to them. And so on.

The sensor fusion then reports to the virtual world model update component of the AI system. The virtual world model updater code would place a maker in the virtual world as to where the pedestrian is standing, though if the sensors misreported the presence of the pedestrian and the sensor fusion did not catch the fault, the virtual world model would now misrepresent the world around it. The AI action planner would then not realize a pedestrian is nearby.

The AI action planner might not issue car control commands to maneuver the car away from the pedestrian. The pedestrian might get run over by the AI self-driving car, all stemming from a subtle fault in a camera. This is a fault that had the AI system been better designed and constructed it should have been able to catch. There should have been other means established to deal with a potentially faulty sensor.

I had mentioned earlier to avoid falling into the mental trap of assuming that there will be just one fault at a time. Recall that my old car had the starter that seemed to come-and-go and also the spark plugs that were working intermittently, thus, there were really two items at fault, each of which reared its ugly head from time-to-time (though not necessarily at the same time).

Suppose a camera on an AI self-driving car experiences a subtle fault, which is intermittent. Imagine that the sensor fusion component of the AI software also has a fault, a subtle one, occurring from time-to-time. These two might arise completely separately of each other. They might also happen to arise at the same time. The AI system needs to be shaped in a manner that it can handle multiple faults from multiple elements, across the full range of elements and subcomponents, and for which those faults occur in subtle ways at varying times.

For ghosts that might appear, see my article:

For the difficulty of debugging AI self-driving cars, see my article:

For my article about the dangers of irreproducibility, see:

For the nature of uncertainty and probabilities in AI self-driving cars, see my article:


The tale of the Byzantine Generals Problem is helpful to serve as a reminder that modern day real-time systems need to be built with fault tolerance.

There are some AI developers that came from a university or research lab that might not have been particularly concerned with fault tolerance since they were devising experimental systems to explore new advances in AI.

When shifting such AI systems into everyday use, it is crucial that fault tolerance be baked into the very fabric of the AI system.

We are going to have the emergence of AI self-driving cars that will be on our streets and will be operating fully unattended by a human driver. We rightfully should expect that fault tolerance has been given a top priority for these real-time systems that are controlling multi-ton vehicles. Without proper and appropriate fault tolerance, the AI self-driving car you see coming down the street could go astray due to a subtle fault in some hidden area of the AI.

An error avalanche could allow the fault to cascade to a level that the AI self-driving car then gets into an untoward incident and human lives are jeopardized.

One of the greatest emperors of the Byzantines was Justinian I, and it is claimed that he had said that the safety of the state was the highest law.

For those AI developers involved in designing and building AI self-driving car systems, I hope that you will abide by Justinian’s advice and aim to ensure that you have dutifully included Byzantine Fault Tolerance or the equivalent thereof for aiming to have safety as the highest attention in your AI system.

Consider that an order by Roman Law, per the Codex Justinianus.

Copyright 2019 Dr. Lance Eliot

This content is originally posted on AI Trends.

This UrIoTNews article is syndicated fromAITrends