Implement Causal Masking for Autoregressive Decoding

Implement causal masking to a given sequence of attention scores. The function should take a 2D tensor of attention scores and return a tensor of the same shape with causal masking applied.

Constraints

  • The input tensor will always be a square matrix (i.e., the number of rows equals the number of columns)

Examples

Example 1

{
  "input": "[[1, 2, 3], [4, 5, 6], [7, 8, 9]]",
  "output": "[[1, -inf, -inf], [4, 5, -inf], [7, 8, 9]]"
}

Example 2

{
  "input": "[[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8], [0.9, 1.0, 1.1, 1.2], [1.3, 1.4, 1.5, 1.6]]",
  "output": "[[0.1, -inf, -inf, -inf], [0.5, 0.6, -inf, -inf], [0.9, 1.0, 1.1, -inf], [1.3, 1.4, 1.5, 1.6]]"
}

</>Code

Test

Input:

use python data or natural language description

Output: