- namespace: Rindow\NeuralNetworks\Layer
- classname: MultiHeadAttention
Multi Head attention layer.
Inputs are query tensor of shape [batch_size, Tq, dim], value tensor of shape [batch_size, Tv, dim] and key tensor of shape [batch_size, Tv, dim].
Outputs shape of tensor is [batch_size, Tq, dim].
Please refer to the following paper for details of the operation.
- Attention is All You Need (Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. & Polosukhin, S. 2017).
- https://arxiv.org/abs/1706.03762
Methods
constructor
$builer->MultiHeadAttention(
?int $num_heads,
?int $key_dim,
?int $value_dim=null,
?float $dropout=null,
?bool $use_bias=null,
?array $input_shapes=null,
int|array|null $attention_axes=null,
string|object|null $kernel_initializer=null,
string|object|null $bias_initializer=null,
)
You can create a MultiHeadAttention layer instances with the Layer Builder.
Arguments
- num_heads: Number of heads.
- key_dim: Attention key dimension size.
Options
- value_dim: Attention key dimension size. If we omit this, key_dim will be the same.
- dropout: Fraction of the input data to drop. If you omit it, it won’t drop.
- use_bias: Use bias in input and output characteristics.
- input_shapes: Array list of shapes. Tell the first layer the shape of the input data. In input_shapes, the batch dimension is not included.
forward
public function forward(
array $inputs,
Variable|bool|null $training=null,
Variable|bool|null $returnAttentionScores=null,
?array $mask=null,
?NDArray $attention_mask=null,
Variable|bool|null $useCausalMask=null,
) : Variable|array
Arguments
- inputs: A 3D NDArray with shape (batch, timesteps, feature).
- training: When training, it is true.
- returnAttentionScores: bool, it True, returns the attention scores (after masking and softmax) as an additional output argument.
- mask: List of the following tensors: query_mask: A boolean mask tensor of shape (batch_size, Tq). If given, the output will be zero at the positions where mask==False. value_mask: A boolean mask tensor of shape (batch_size, Tv). If given, will apply the mask such that values at positions where mask==False do not contribute to the result.
- attention_mask: Reserved. always null or omited.
- useCausalMask: boolean. True when used to mask causal relationships.
Input shape
Input is a list in the form of [query,value] or [query,value,key]. If the key is omitted, the same tensor as value is entered. the query tensor shape is [batch_size, Tq, dim]. the value tensor shape is [batch_size, Tv, dim]. the key tensor shape is [batch_size, Tv, dim].
Output shape
if return_attention_scores is true, list of [outputs,scores]. the outputs shape is [batch_size, Tq, dim]. the scores shape is [batch_size, Heads, Tq, Tv]
$attention = $builder->layers()->MultiHeadAttention(8,256); // heads=8, key_dim=256
....
$query = $mo->ones([4,3,5]);
$value = $mo->ones([4,2,5]);
....
[$outputs,$scores] = $attention->forward([$query,$value],true,
['return_attention_scores'=>true]);
# $outputs->shape() : [4,3,5]
# $scores->shape() : [4,8,3,2]
Example of usage
class Foo extends AbstractModel
{
public function __construct($backend,$builder)
{
...
$this->attention = $builder->layers()->MuitiHeadAttention(8,256);
....
}
protected function call(.....) : NDArray
{
...
$outputs = $this->attention->forward([$key, $value],$training);
...
}
}