This website requires JavaScript.
Explore
Gist
Help
Register
Sign In
Mirrors
/
preemo_text-generation-inference
mirror of
https://github.com/Preemo-Inc/text-generation-inference.git
Watch
1
Star
0
Fork
You've already forked preemo_text-generation-inference
0
Code
Issues
Packages
Projects
Releases
Wiki
Activity
972e9a7f7c
preemo_text-generation-infe...
/
.gitignore
5 lines
49 B
Plaintext
Raw
Normal View
History
Unescape
Escape
v0.1.0
2022-10-18 07:19:03 -06:00
.idea
Starting some routing tests. (#233)
2023-04-25 06:13:14 -06:00
target
router/tokenizer.json
feat(server): Rework model loading (#344) # What does this PR do? Reworked the loading logic. Idea is to use cleaner loading code: - Remove need for `no_init_weights` - Remove all weird `bnb_linear` and `load_weights` and `post_load_weights`. New code layout: - New class `Weights` in charge of handling loading the weights from multiple files into appropiate tensors (potentially sharded) - TP layers now are "shells", they contain the code to know what kind of sharding we need + eventual `all_reduce`. They do not inherit from linear, but they contain some kind of Linear instead - the contained linear can be either FastLinear, BnbLinear or GPTq Linear next. - All modeling code is explictly made for sharding, process group is just no-ops for non sharded code (removes a lot of test cases) ![Screenshot from 2023-05-19 23-19-59](https://github.com/huggingface/text-generation-inference/assets/204321/9a802654-74a3-488c-87a8-073743a6143f) --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-41-161.taildb5d.ts.net> Co-authored-by: Ubuntu <ubuntu@ip-172-31-41-161.ec2.internal> Co-authored-by: OlivierDehaene <olivier@huggingface.co> Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2023-06-08 06:51:52 -06:00
*__pycache__*