Failure of contextual invariance in gender inference with large language models — ThinkLLM