Implement causal masking to a given sequence of attention scores. The function should take a 2D tensor of attention scores and return a tensor of the same shape with causal masking applied.
{
"input": "[[1, 2, 3], [4, 5, 6], [7, 8, 9]]",
"output": "[[1, -inf, -inf], [4, 5, -inf], [7, 8, 9]]"
}
{
"input": "[[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], [0.9, 1.0, 1.1, 1.2], [1.3, 1.4, 1.5, 1.6]]",
"output": "[[0.1, -inf, -inf, -inf], [0.5, 0.6, -inf, -inf], [0.9, 1.0, 1.1, -inf], [1.3, 1.4, 1.5, 1.6]]"
}
use python data or natural language description