- namespace: Rindow\NeuralNetworks\Layer
- classname: Attention
Dot-product attention layer.
Inputs are query tensor of shape [batch_size, Tq, dim], value tensor of shape [batch_size, Tv, dim] and key tensor of shape [batch_size, Tv, dim]. The calculation follows the steps:
- Calculate scores with shape [batch_size, Tq, Tv] as a query-key dot product: scores = matmul(query, key, transpose_b=True).
- Use scores to calculate a distribution with shape [batch_size, Tq, Tv]: distribution = softmax(scores).
- Use distribution to create a linear combination of value with shape [batch_size, Tq, dim]: return matmul(distribution, value).
Methods
constructor
$builer->Attention(
?array $input_shapes=null,
?bool $use_scale=null,
?string $name=null,
)
You can create a Attention layer instances with the Layer Builder.
Options
- input_shape: Tell the first layer the shape of the input data. In input_shape, the batch dimension is not included.
- use_scale: If True, will create a scalar variable to scale the attention scores.
forward
public function forward(
array $inputs,
Variable|bool|null $training=null,
Variable|bool|null $returnAttentionScores=null,
?array $mask=null,
) : Variable|array
Arguments
- inputs: A 3D NDArray with shape (batch, timesteps, feature).
- training: When training, it is true.
- returnAttentionScores: bool, it True, returns the attention scores (after masking and softmax) as an additional output argument.
- mask: List of the following tensors: query_mask: A boolean mask tensor of shape (batch_size, Tq). If given, the output will be zero at the positions where mask==False. value_mask: A boolean mask tensor of shape (batch_size, Tv). If given, will apply the mask such that values at positions where mask==False do not contribute to the result.
Input shape
Input is a list in the form of [query,value] or [query,value,key]. If the key is omitted, the same tensor as value is entered. the query tensor shape is [batch_size, Tq, dim]. the value tensor shape is [batch_size, Tv, dim]. the key tensor shape is [batch_size, Tv, dim].
Output shape
if return_attention_scores is true, list of [outputs,scores]. the outputs shape is [batch_size, Tq, dim]. the scores shape is [batch_size, Tq, Tv]
$attention = $builder->layers()->Attention();
....
$query = $mo->ones([4,3,5]);
$value = $mo->ones([4,2,5]);
....
[$outputs,$scores] = $attention->forward([$query,$value],true,
['return_attention_scores'=>true]);
# $outputs->shape() : [4,3,5]
# $scores->shape() : [4,3,2]
Example of usage
class Foo extends AbstractModel
{
public function __construct($backend,$builder)
{
...
$this->attention = $builder->layers()->Attention();
....
}
protected function call(.....) : NDArray
{
...
$outputs = $this->attention->forward([$key, $value],$training);
...
}
}