Actually, one 4096x4096 = 8 1024x1024 or 64 512x512 textures. Surface area of a texture is squared, not linear. 
To answer your main question, however, there is no hard and fast rule. The size of the texture that you should use is the smallest size that will support the amount of detail that you want to offer at the distance that any given part of the model will actually be viewed from.
In other words, a model that is 100 yards high in world terms can get away with a texture that’s only 64x64 if the model is far enough away from the camera.
In terms of rendering time, a 4096x4096 will require as much in the way of time and system resources as four 1024x1024 textures. But that’s not usually how you’d break the model up. A workable practice, for example, would be to use a 2048x2048 texture for the head, 1024x1024 for the hands, and 1024x1024 for the body. This would work, if your animation is going to have close-ups of the head and hands, but medium shots of the body. Also, ask yourself if any parts of the model aren’t going to be visible to the camera at all. If not, they don’t need texture. In fact, they don’t even need polygons!
Also, here’s another factor to bear in mind since you brought up photorealism. 4096x4096 for a head or body texture generally is considered to be photoreal. For the head, it offers enough pixels per polygon to include details such as skin pores and really fine wrinkles. But such details are only necessary if you’re zooming in close enough to see those pores. If not, then you’d want to go with something like 2048x2048. Or even lower.