This will require some trial and error and original work.
To start with, I often take a background image without people or the objects I wish to detect. This backgroun image can be subtracted from the image from the camera to help clear out the background. A simple of example of this can be found in this blog post: http://blog.wolfram.com/2010/11/10/how-to-make-a-webcam-intruder-alarm-with-mathematica/
Actually writing software to detect that something is the top of a head might be harder and I'm not sure what the example video/images you are using looks like. Posting a simple example will help people understand your problem better.