.. |
merges
|
Enable multiple LoRa adapters (#2010)
|
2024-06-25 14:46:27 -04:00 |
__init__.py
|
feat(server): Add native support for PEFT Lora models (#762)
|
2023-08-03 17:22:45 +02:00 |
adapter.py
|
fix: refactor adapter weight loading and mapping (#2193)
|
2024-07-24 15:32:14 -04:00 |
chunks.py
|
server: use chunked inputs
|
2024-06-07 08:09:04 +02:00 |
convert.py
|
Force weights_only (before fully breaking pickle files anyway). (#1710)
|
2024-04-05 19:23:57 +02:00 |
dist.py
|
feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248)
|
2024-07-20 19:02:04 +02:00 |
hub.py
|
Enable multiple LoRa adapters (#2010)
|
2024-06-25 14:46:27 -04:00 |
import_utils.py
|
refine get xpu free memory/enable Qwen2/gemma2/gemma/phi in intel platform (#2132)
|
2024-07-01 14:32:54 +02:00 |
log.py
|
feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248)
|
2024-07-20 19:02:04 +02:00 |
logits_process.py
|
Fixing frequency penalty (#1811)
|
2024-04-30 12:13:23 +02:00 |
peft.py
|
Enable multiple LoRa adapters (#2010)
|
2024-06-25 14:46:27 -04:00 |
quantization.py
|
Add support for repacking AWQ weights for GPTQ-Marlin (#2278)
|
2024-07-23 13:08:20 +02:00 |
segments.py
|
Enable multiple LoRa adapters (#2010)
|
2024-06-25 14:46:27 -04:00 |
sgmv.py
|
Enable multiple LoRa adapters (#2010)
|
2024-06-25 14:46:27 -04:00 |
speculate.py
|
chore: formatting
|
2023-12-11 14:49:52 +01:00 |
tokens.py
|
Use the generation config. (#1808)
|
2024-04-25 19:41:50 +02:00 |
watermark.py
|
Fixing watermark. (#851)
|
2023-08-16 07:17:26 +02:00 |
weights.py
|
fix(server): fix fp8 weight loading (#2268)
|
2024-07-22 15:51:32 +00:00 |