Documentation for Evaluator
¶
luminator.evaluation.evaluator.EvaluatorBase
¶
luminator.evaluation.evaluator.AttributionEvaluator
¶
Bases: EvaluatorBase
The AttributionEvaluator is used to evaluate attributions by computing different metrics.
predict_fn
instance-attribute
¶
predict_fn: Callable[Concatenate[Tuple[Tensor, ...], P], Generator[SequenceClassifierOutput, None, None]]
non_zero_weights
¶
Computes the non-zero-weights metric
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanations |
List[SequenceExplanation]
|
A list of SequenceExplanation |
required |
threshold |
float
|
All values greater than threshold increase the metric. |
1e-09
|
Returns:
Type | Description |
---|---|
List[Tensor]
|
One score for each explanation |
faithfulness
¶
replaces the highest attributed token with an UNK token and predicts the example the score is the difference of the predictions of the base and permuted example a highter score shows a higher difference, showing that removing the most important token has a large impact on the prediction
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanations |
List[SequenceExplanation]
|
A list of SequenceExplanation |
required |
Returns:
Type | Description |
---|---|
List[Tensor]
|
One score for each explanation |
truthfulness
¶
For each token in the example a new example is created where the token is replaced by UNK All permuted examples are then predicted. For each permuted token, the score is increased by 1 if the removal of the token leads to a decrease in prediction-probability (positive) or increase in prediction-probability (negative). The score is averaged over all tokens. The higher the scores, the more truthfully attributed tokens have been computed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanations |
List[SequenceExplanation]
|
A list of SequenceExplanation |
required |
Returns:
Type | Description |
---|---|
List[Tensor]
|
One score for each explanation |
faithful_truthfulness
¶
For each token in the example a new example is created where the token is replaced by UNK All permuted examples are then predicted. For each permuted token, difference in prediction-probability is added to the score. The score is averaged over all tokens. The higher the score, the more influence did each token have and the more precise was each attribution in predicting the influence on the model output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanations |
List[SequenceExplanation]
|
A list of SequenceExplanation |
required |
Returns:
Type | Description |
---|---|
List[Tensor]
|
One score for each explanation |
ranked_faithful_truthfulness
¶
For each token in the example a new example is created where the token is replaced by UNK All permuted examples are then predicted. The attributed tokens are sorted by their attribution-value thus giving the highest attribution the highest rank and vice versa. For each permuted token, the difference in prediction-probability divided by the rank is added to the score. The higher the score, the more influence did each token have and the more precise was each attribution in predicting the influence on the model output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanations |
List[SequenceExplanation]
|
A list of SequenceExplanation |
required |
Returns:
Type | Description |
---|---|
List[Tensor]
|
One score for each explanation |
robustness
¶
robustness(explanation: SequenceExplanation, tweaked_explanations: List[SequenceExplanation]) -> Tensor
Robustness measures the degree of change between the interpretations for the initial and modified instances.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanation |
SequenceExplanation
|
A SequenceExplanation |
required |
tweaked_explanations |
List[SequenceExplanation]
|
A list of tweaked SequenceExplanation |
required |
Returns:
Type | Description |
---|---|
Tensor
|
One score for each explanation |
rationale_f1
¶
rationale_f1(explanation: SequenceExplanation, rationales: List[int], top_k: Optional[int] = None) -> float
Maps all token attributions to their corresponding words and classifies, whether they are rationales or not. Then measures the f1-score in comparison to the given rationales.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanation |
SequenceExplanation
|
A SequenceExplanation |
required |
rationales |
List[int]
|
A list of rationales (0 or 1) for each word in the explanation |
required |
top_k |
Optional[int]
|
Specifies the way the attributions are used to classify rationales. If top_k is 0 or None, each token where the attribution >= mean + std of all attributions is classified as positive. Otherwise the top_k highest attributions are classified as positive. |
None
|
Returns:
Type | Description |
---|---|
float
|
One score for each explanation |
rationale_accuracy
¶
rationale_accuracy(explanation: SequenceExplanation, rationales: List[int], top_k: Optional[int] = None) -> float
Maps all token attributions to their corresponding words and classifies, whether they are rationales or not. Then measures the accuracy-score in comparison to the given rationales.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanation |
SequenceExplanation
|
A SequenceExplanation |
required |
rationales |
List[int]
|
A list of rationales (0 or 1) for each word in the explanation |
required |
top_k |
Optional[int]
|
Specifies the way the attributions are used to classify rationales. If top_k is 0 or None, each token where the attribution >= mean + std of all attributions is classified as positive. Otherwise the top_k highest attributions are classified as positive. |
None
|
Returns:
Type | Description |
---|---|
float
|
One score for each explanation |
rationale_recall
¶
rationale_recall(explanation: SequenceExplanation, rationales: List[int], top_k: Optional[int] = None) -> float
Maps all token attributions to their corresponding words and classifies, whether they are rationales or not. Then measures the recall-score in comparison to the given rationales.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanation |
SequenceExplanation
|
A SequenceExplanation |
required |
rationales |
List[int]
|
A list of rationales (0 or 1) for each word in the explanation |
required |
top_k |
Optional[int]
|
Specifies the way the attributions are used to classify rationales. If top_k is 0 or None, each token where the attribution >= mean + std of all attributions is classified as positive. Otherwise the top_k highest attributions are classified as positive. |
None
|
Returns:
Type | Description |
---|---|
float
|
One score for each explanation |
rationale_precision
¶
rationale_precision(explanation: SequenceExplanation, rationales: List[int], top_k: Optional[int] = None) -> float
Maps all token attributions to their corresponding words and classifies, whether they are rationales or not. Then measures the precision-score in comparison to the given rationales.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
explanation |
SequenceExplanation
|
A SequenceExplanation |
required |
rationales |
List[int]
|
A list of rationales (0 or 1) for each word in the explanation |
required |
top_k |
Optional[int]
|
Specifies the way the attributions are used to classify rationales. If top_k is 0 or None, each token where the attribution >= mean + std of all attributions is classified as positive. Otherwise the top_k highest attributions are classified as positive. |
None
|
Returns:
Type | Description |
---|---|
float
|
One score for each explanation |