margin_based_tools
parallel_sentence_mining.margin_based.margin_based_tools ¶
The module contains the class MarginBased
that implements the margin-based scoring for parallel sentence mining.
MarginBased ¶
Class that implements the margin-based scoring for parallel sentence mining.
Methods:
-
select_margin
–Select the margin function.
-
margin_based_score
–Compute the margin-based score.
-
margin_based_score_candidates
–Compute the margin-based scores for a batch of sentence pairs.
-
select_best_candidates
–Select the best sentence pairs.
-
get_sentence_pairs
–Get the sentence pairs.
Source code in hadal/parallel_sentence_mining/margin_based/margin_based_tools.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
|
__init__ ¶
select_margin ¶
Select the margin function.
Source: https://arxiv.org/pdf/1811.01136.pdf 3.1 Margin-based scoring
Parameters:
-
margin
(str
, default:'ratio'
) –The margin function to use. Valid options are
ratio
anddistance
.
Raises:
-
NotImplementedError
–If the given
margin
is not implemented.
Returns:
-
margin_func
(Callable
) –The margin function.
Source code in hadal/parallel_sentence_mining/margin_based/margin_based_tools.py
margin_based_score ¶
margin_based_score(
source_embeddings: numpy.ndarray,
target_embeddings: numpy.ndarray,
fwd_mean: numpy.ndarray,
bwd_mean: numpy.ndarray,
margin_func: Callable,
) -> numpy.ndarray
Compute the margin-based score.
Source: https://arxiv.org/pdf/1811.01136.pdf 3.1 Margin-based scoring
Parameters:
-
source_embeddings
(numpy.ndarray
) –Source embeddings.
-
target_embeddings
(numpy.ndarray
) –Target embeddings.
-
fwd_mean
(numpy.ndarray
) –The forward mean.
-
bwd_mean
(numpy.ndarray
) –The backward mean.
-
margin_func
(Callable
) –The margin function.
Returns:
-
score
(numpy.ndarray
) –Margin-based score.
Source code in hadal/parallel_sentence_mining/margin_based/margin_based_tools.py
margin_based_score_candidates ¶
margin_based_score_candidates(
source_embeddings: numpy.ndarray,
target_embeddings: numpy.ndarray,
candidate_inds: numpy.ndarray,
fwd_mean: numpy.ndarray,
bwd_mean: numpy.ndarray,
margin: Callable,
) -> numpy.ndarray
Compute the margin-based scores for a batch of sentence pairs.
Parameters:
-
source_embeddings
(numpy.ndarray
) –Source embeddings.
-
target_embeddings
(numpy.ndarray
) –Target embeddings.
-
candidate_inds
(numpy.ndarray
) –The indices of the candidate target embeddings for each source embedding.
-
fwd_mean
(numpy.ndarray
) –The forward mean.
-
bwd_mean
(numpy.ndarray
) –The backward mean.
-
margin
(Callable
) –The margin function.
Returns:
-
scores
(numpy.ndarray
) –The margin-based scores for the candidate pairs.
Source code in hadal/parallel_sentence_mining/margin_based/margin_based_tools.py
select_best_candidates ¶
select_best_candidates(
source_embeddings: numpy.ndarray,
x2y_ind: numpy.ndarray,
fwd_scores: numpy.ndarray,
target_embeddings: numpy.ndarray,
y2x_ind: numpy.ndarray,
bwd_scores: numpy.ndarray,
strategy: str = "max_score",
) -> tuple[numpy.ndarray, numpy.ndarray]
Select the best sentence pairs.
Source: https://arxiv.org/pdf/1811.01136.pdf 3.2 Candidate generation and filtering (only max. score)
Parameters:
-
source_embeddings
(numpy.ndarray
) –Source embeddings.
-
x2y_ind
(numpy.ndarray
) –Indices of the target sentences corresponding to each source sentence.
-
fwd_scores
(numpy.ndarray
) –Scores of the forward alignment between source and target sentences.
-
target_embeddings
(numpy.ndarray
) –Target embeddings.
-
y2x_ind
(numpy.ndarray
) –Indices of the source sentences corresponding to each target sentence.
-
bwd_scores
(numpy.ndarray
) –Scores of the backward alignment between target and source sentences.
-
strategy
(str
, default:'max_score'
) –The strategy to use for selecting the best candidates.
Raises:
-
NotImplementedError
–If the given
strategy
is not implemented.
Returns:
-
numpy.ndarray
–- indices (numpy.ndarray): An array of indices representing the sentence pairs.
-
numpy.ndarray
–- scores (numpy.ndarray): An array of scores representing the similarity between the sentence pairs.
Source code in hadal/parallel_sentence_mining/margin_based/margin_based_tools.py
get_sentence_pairs ¶
get_sentence_pairs(
indices: numpy.ndarray,
scores: numpy.ndarray,
source_sentences: list[str],
target_sentences: list[str],
) -> list[tuple[numpy.float64, str, str]]
Get the sentence pairs.
Parameters:
-
indices
(numpy.ndarray
) –An array of indices representing the sentence pairs.
-
scores
(numpy.ndarray
) –An array of scores representing the similarity between the sentence pairs.
-
source_sentences
(list[str]
) –Source sentences.
-
target_sentences
(list[str]
) –Target sentences.
Returns:
-
bitext_list
(list[tuple[numpy.float64, str, str]]
) –A list of tuples with score, source sentences and target sentences.