The speed that a FPGA will run at highly depends on the designer of the logic and the tools that synthesis, map and place and route the design. As such the only absolute thing you can say until a design is built and tested are the guaranteed specs from Xilinx. Those won't tell you how fast a design will run only speciified timing guarantees of various elements. Now most Bitcoins already push FPGAs, GPUs etc. beyond specification already so you all know that specification doesn't tell you everything.
For not so light reading look at
http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf and you can follow some of the following. if you head starts hurting do feel free to stop reading.I will mention a few things that might be of interest but bear in mind the paragraph above.
Starting with the clock tree specification in -2 is 375MHz max, -3 is 400MHz, -3N is 400MHz. That is the limit specification most logic won't run as fast as that as there routing delay, maybe multiple lut delays, all between registers. So what is actual the difference between the grades. It's not the limit spec but actually a range that various with a bunch of batch related things. Xilinx don't make a -2 or -3 as such they make a XC6SLX150 die which is graded into speed grades. Die performance has a statistical link to where on a wafer (usually large round thing with hundreds or thousands or dies on it) a die is, the processing (slight variations), and even the starting raw wafer quality. Out of all that you get a pile of dies with performance that typically follows a statistical curve that looks like an inverted bathtub. Somewhere along that curve Xilinx has drawn some lines that create bands that are speed grades. Remember the profile shape most die will be close together on the big lumpy part of the curve and somewhere at the top of that lump is the line that seperates -2 from -3. Most of the -3 yield will be close to the top of the -2 yield area. It's then a matter of luck what end of the -2 grade you are in but statistically near the middle is likely.The -3 is very likely to be close to the top of -2. So out of this your -2 is highly likely to mid point between the the -2 limit and -3 limit and -3 chips is likelyto be on the -3 limit.
I will mention the -3N grade. It is a runt grade, and very misleading, that Xilinx created because they had a die yield issue on memory controllers in S6. It should have been called -2N because most of the guaranteed specs are the same as -2. I think the clock tree is virtually the only one the same as -3. So our competitors selling -3N are probably not any faster than the -2 we are currently using.
Ok now that your head is spining lets make that slight simplier. I will take one of the timing parameters Tilo from the datasheet which is the propagation through a LUT your basic building block and very important to Bitcoin logic speed. A -3 has 0.21nS max, -3N has 0.26nS max and -2 has 0.26nS max. Notice the -3N time. So from this an average -2 is likely to be 0.235nS and -3 0.26nS from what I said above. In reality they are probably closer but lets work with these numbers. That says a -2 is 90% of the speed of a -3 and yes the -3 is quicker. However for that 10% more you will pay 25-50% more for the chip typically. Now for Bitcoiners there is something more of relevance to say on this and it's from knowledge gained over many years and not a spec. Generally faster grade chips will burn more static power than the slow chip. In temperature limited Bitcoining that will either reduce the benefit of -3 speed or mean you spend more on electricity and cooling.