Author

Topic: AI generative adversarial viruses (Read 112 times)

member
Activity: 691
Merit: 51
December 20, 2023, 07:40:49 AM
#1
Generative AI as we have it today is a problem. And it is going to get much worse. In AI, adversarial examples are inputs that are designed to make the AI produce the wrong answer or wrong result. For example, an adversarial example could be a picture of a horse that is obviously a horse to anyone except that the AI thinks that the horse which is obviously a horse is something like a tornado or a scorpion or a map of Texas.

For generative AI, adversarial examples could be much worse. An AI model C could take an input z and return the same output z. The input and output z could arise naturally. Suppose that we have two AI models A,B. The AI model A could take an input x and return the output y while the AI model B could take the input y and return the output y. This means that x is a fixed point for the composition of AI models BA. We could also have more fixed points for more complicated composition of AI systems.

Stable fixed points in mathematics arise naturally all the time by a process of iteration. For example, the contraction mapping theorem states that every contraction f from a complete metric space to itself has a fixed point. To attain this fixed point, one starts off with a random point, then one iteratively applies the contraction to the same point, and this sequence of points will converge to a stable fixed point. In linear algebra, the dominant eigenvector is the fixed point of the operation of repeatedly applying a matrix, and this fixed point is attained by starting off with a random vector and repeatedly applying the matrix to that vector until the vector stabilizes (and also divide by the norm so that the norm does not grow too much). If P is a finite poset and f is a monotonic function from P to itself and there is some p where f(p) is comparable with p, then f has some fixed point. You can find the fixed point by starting off with that p where p is comparable to f(p) and then iteratively applying f. Fixed points are often unavoidable. For example, Brouwer's fixed point theorem says that any continuous function from the cube to itself has a fixed point. I have observed myself several kinds of machine learning models produce such fixed points. This is a common occurence.

So if compositions of generative AI models have an attractive fixed point, then they will keep on perpetuating the same inputs and outputs. These fixed points may arise naturally. So what can be done about this? Can we train AI models to meaningless recognize fixed points and refuse to perpetuate them? Can we add randomness to make sure that fixed points do not occur? That may be possible, but fixed points are evolving creatures that probably cannot easily be patched. If we train AI models to recognize and avoid fixed points, then the fixed points will naturally change and adapt so that they evade our adversarial training. One should think of these fixed points as viruses that infect AI systems and cause havoc.

I am a fan of replacing neural networks with systems that are more interpretable, mathematical, and less prone to adversarial examples, and I have been working in this direction for cryptocurrency research, but it is unclear as to how well my kind of machine learning algorithms stand against generative adversarial fixed point viruses.

-Joseph Van Name Ph.D.
Jump to: