![Wink](https://bitcointalk.org/Smileys/default/wink.gif)
No hard feelings.
I am a computer engineer. I have a degree in computer engineering. I'm trying to inform you that your intuitive notions about the difference between GPUs and CPUs - while good enough for basic comprehension - are not rigorous and that, in this case, that matters.
There is a theorem in computational complexity (the field of mathematics that studies how fast any given kind of formal problem can be solved), called the speed-up theorem. Basically, this theorem says that any particular solution to a problem (an "algorithm") can be sped up by a linear factor. So, suppose that you find an algorithm A that solves problem Z in time n2, where n is the size of a problem in Z. This is called quadratic complexity - as n increases, the time required to solve the problem grows as the square of n. The speed-up theorem tells us that it is always possible to find some fixed factor, k, such that we can construct a new algorithm A' from A that solves a problem in Z in time n2 / k. Note that this does not change the complexity class - A still solves problems in Z in quadratic time. Intuitively, this makes sense - if I can solve a problem in time n2 with one computer, then I should always be able to solve it in nearly n2 / 2 with two computers. There are some very intractable problems where this becomes difficult (due to network complexity) but such problems tend to be academic, i.e. specially constructed to make parallelization difficult.
This is mostly not true anymore. GPGPUs (General-Purpose GPU) can, in principle, do anything that CPUs can do and the latest GPU cards are being released with GPGPU cores in them. GPU clock speeds are in the same class as CPUs, so they are not slow. GPUs are designed around "data parallelism" or "data flow architecture" and this is why they are well-suited to graphics applications or any application where you can easily parallelize the work by simply dividing up the work surface and handing out more or less identical instructions to each work unit (rendering pipeline).
For any large search problem (and any non-trivial PoW must be a large search problem), GPUs will be superior. If I have to try 260 possibilities (about a billion billion possibilities) in order to solve a given PoW problem, I can try these one-by-one on a CPU or I can parallelize the problem and have multiple CPUs or GPUs work on it in parallel. Suppose I have 128 GPUs available to me. Instead of having to search 260 possibilities, each GPU now only has to solve 253 possibilities, resulting in a 128x reduction in search time. Factoring is just a large search problem.