Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling — ThinkLLM