FisherSketch enables practical source selection for LLM families by measuring task similarity through Fisher alignment signatures (16 KB per task) instead of expensive full Fisher matrices, revealing whether tasks differ in activations, errors, or their interaction.
This paper solves the problem of selecting training data sources for language models that share vocabularies but differ in tasks (like SMILES vs protein sequences).