A tangent vector is what John Baez has been explaining. One way to represent a tangent vector is by giving a (smooth) parametrized curve through the point. Two such curves yield the same tangent vector if they are tangent to each other, and if increments in their respective parameters move them along at the same rate at the point in question.
In coordinates, one has a curve (x(s),y(s),z(s),t(s)), and its tangent vector has coordinates (x'(0),y'(0),z'(0),t'(0)) at the point (x(0),y(0),z(0),t(0)). Of course, we don't want to be stuck with just one coordinate system, but if two curves yield the same tangent vector in one coordinate system, they do in all.
To tell our Roman a tangent vector, we put a dagger in his hand, aim it, and say, "That direction, but think of it as 5 stadia in magnitude."
A cotangent vector can be thought of as a gradient. I sometimes remind my students that these tend to be in different units. A gradient is in units *per* distance.
To tell our Roman a cotangent vector, spray a cloud of perfume near him, in such a way that if he moves in certain directions the smell gets stronger. :-) Assuming, I suppose, that one has an agreed-upon scale for intensity of smell.
One can present a cotangent vector by giving a function of which it is the gradient at the point in question. In coordinates, the gradient of f has coordinates (@f/@x,@f/@y,@f/@z,@f/@t). Again, we need not get stuck on any one coordinate system; if two functions have the same gradient in one coordinate system, they do in all.
They are often treated in elementary math courses as being essentially the same type of thing ("vector"). Part of the reason is that if one has a *metric* (q.v.) one can identify the two. You can associate to the cotangent vector the tangent vector which suggests moving in the direction of fastest increase of the function, and whose length is the rate of increase. This only makes sense, however, because we can compare lengths in different directions. Taking one step to the south, say, increases the smell more than one step in any other direction. Without such a measure of distance as "steps", though, there's no direct comparison between the rate at which the function increases in different directions.
Also, tangent and cotangent vectors transform differently when you change coordinates. I mentioned already a difference in how they thansform under a change of units. If you multiply the coordinates of all the points by 10, then the coordinates of a tangent vector also get multiplied by 10, but the coordinates of a cotangent vector are reduced by a factor of 10: the amount by which the function increases per "unit" change in a coordinate is less, not greater.
If you like, a more mathematical example. Let u=x+y and v=y be new coordinates for the x-y plane. The parametrized line x=t, y=0 defines a tangent vectorat the origin; call it a. The parametrized line x=0, y=t defines a tangent vector; call it b. In the original (x,y) coordinates these are (1,0) and (0,1). In the (u,v) coordinates these lines become u=t,v=0 and u=t,v=t. Thus in the new coordinates they are (1,0) and (1,1).
Now consider the cotangent vectors. The gradient of the function x is often denoted by dx. The gradient of y is dy. They are of course represented by coordinates (1,0) and (0,1). Now consider how they are represented in (u,v) coordinates. We have to take partial derivatives with respect to u and v. We carefully avoid the pitfall of supposing that taking the partial derivative with respect to v and with respect to y are the same, because v=y. In the one case, one is leaving u the same, in the other x. Since x=u-v and y=v, dx=du-dv and dy=dv. Hence in the new coordinates, dx=(1,-1) and dy=(0,1).
Thus the matrix for transforming tangent vectors is
(1 1) (0 1)
but for cotangent vectors it is the inverse,
(1 -1) (0 1).
The reason for being extra careful in relativity theory is that the metric is part of what one is trying to figure out, and even with a known metric, the correspondence between tangent and cotangent vectors requires calculation as a function of the metric. They call it "raising and lowering indices", because one uses upper indices for one aspect and lower for the other.