Just to put some color around this -- take DDR3 1333 as an example:
Using this calculator --> http://referencedesigner.com/tutorials/si/si_06.php
Assume a 6 mil prepreg (H) under top layer, 1 oz copper (T) @ 1.4 mils, and a 5 mil trace width (W) with a dielectric constant of 4.2 (er).
You'll get about 138 ps/inch.
At 667MHz -- but DDR so we're capturing on both edges, so let's figure 1333 -- you get 0.75ns or 750ps per clock edge. This means that if your traces are <= 750/138 or ~5.4 inches, all of your signals should arrive within a given clock edge, so length matching becomes essentially irrelevant. On most embedded DDR3 designs, if you're careful with your routing, you should be under 3 inches from CPU to DDR3, so you have plenty of headroom.
Even termination resistors are somewhat optional at this since any reflections tend to get soaked up pretty easily.