LLMs can generate syntactically correct LTL formulas but frequently produce semantically incorrect ones, suggesting they understand formal syntax better than formal semantics—a gap that matters for real security and policy tools.
This paper tests whether large language models can translate English sentences into Linear Temporal Logic (LTL)—a formal language used to specify system requirements and security policies. The researchers find that LLMs handle the grammar of LTL well but struggle with its meaning, and that treating the task as code completion significantly improves results.