huggingface_automodel
huggingface_automodel ¶
This module contains the class HuggingfaceAutoModel
that can be used to encode text using a Huggingface AutoModel.
HuggingfaceAutoModel ¶
Class to encode text using a Huggingface AutoModel.
Methods:
-
encode
–Encode text using a Huggingface AutoModel.
Source code in hadal/huggingface_automodel.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
|
__init__ ¶
__init__(
model_name_or_path: str | pathlib.Path,
device: str | None = None,
*,
enable_logging: bool = True,
log_level: int | None = logging.INFO
) -> None
Initialize HuggingfaceAutoModel object.
Parameters:
-
model_name_or_path
(str | pathlib.Path
) –Name or path to the pre-trained model.
-
device
(str | None
, default:None
) –Device for the model.
-
enable_logging
(bool
, default:True
) –Logging option.
-
log_level
(int | None
, default:logging.INFO
) –Logging level.
Source code in hadal/huggingface_automodel.py
encode ¶
encode(
sentences: str | list[str],
batch_size: int = 32,
output_value: str = "pooler_output",
convert_to: str | None = None,
*,
normalize_embeddings: bool = False,
device: str | None = None
) -> list[torch.Tensor] | torch.Tensor | numpy.ndarray
Encode text using a Huggingface AutoModel.
Parameters:
-
sentences
(str | list[str]
) –The sentences to encode.
-
batch_size
(int
, default:32
) –The batch size.
-
output_value
(str
, default:'pooler_output'
) –Model output type. Can be
pooler_output
orlast_hidden_state
. -
convert_to
(str | None
, default:None
) –Convert the embeddings to
torch
ornumpy
format. Iftorch
, it will return atorch.Tensor
. Ifnumpy
, it will return anumpy.ndarray
. IfNone
, it will return alist[torch.Tensor]
. -
normalize_embeddings
(bool
, default:False
) –Normalize the embeddings.
-
device
(str | None
, default:None
) –Device for the model.
Raises:
-
NotImplementedError
–If the
output_value
is not implemented.
Returns:
-
all_embeddings
(list[torch.Tensor] | torch.Tensor | numpy.ndarray
) –The embeddings of the sentences.
Source code in hadal/huggingface_automodel.py
_text_length ¶
Calculate the length of the given sentences.
Parameters:
-
text
(list[str] | list | str
) –The sentences.
Raises:
-
TypeError
–Input cannot be a
dict
. -
TypeError
–Input cannot be a
tuple
.
Returns:
-
length
(int
) –The length of the text.
Source code in hadal/huggingface_automodel.py
batch_to_device ¶
Move a batch of tensors to the specified device.