Why Do Vision Language Models Struggle To Recognize Human Emotions? — ThinkLLM