History

Nicolas Patry d18ed5cfc5 Mllama flash version (#2585 ) * Working loading state. * Preprocessing. * Working state ? (Broke idefics1 temporarily). * Cleaner condition. * Fix idefics. * Updating config, removing TODO * Mllama * Ugrade transformers 4.45 * Flashing mllama. * Starting to get there. * Working state. * Integrations tests for mllama (cutting to 10 tokens because there seems' to be instability after (meaning size of the batch matters. * Updating model link. * Earlier assert. * Fix vlm ? * remove log. * Force ignore all images but last. * Default dtype bfloat16. * Update integration test after switch to bf16. * Remove dead code. * Removed dead code. * Upgrade the flake to latest transformers/tokenizers * Move to hf tgi-nix * Upgrade to 0.5.0		2024-10-02 11:22:13 +02:00
..
custom_kernels	All integration tests back everywhere (too many failed CI). (#2428 )	2024-08-16 21:19:46 +02:00
exllama_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
exllamav2_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
tests	Fix tokenization yi (#2507 )	2024-09-11 22:41:56 +02:00
text_generation_server	Mllama flash version (#2585 )	2024-10-02 11:22:13 +02:00
.gitignore	Impl simple mamba model (#1480 )	2024-02-08 10:19:45 +01:00
Makefile	Lots of improvements (Still 2 allocators) (#2449 )	2024-08-29 16:29:01 +02:00
Makefile-awq	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-eetq	Upgrade EETQ (Fixes the cuda graphs). (#1729 )	2024-04-12 08:15:28 +02:00
Makefile-exllamav2	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
Makefile-fbgemm	Add Directory Check to Prevent Redundant Cloning in Build Process (#2486 )	2024-09-07 13:19:43 +02:00
Makefile-flash-att	Hotfixing `make install`. (#2008 )	2024-06-04 23:34:03 +02:00
Makefile-flash-att-v2	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
Makefile-flashinfer	Prefix test - Different kind of load test to trigger prefix test bugs. (#2490 )	2024-09-11 18:10:40 +02:00
Makefile-lorax-punica	Enable multiple LoRa adapters (#2010 )	2024-06-25 14:46:27 -04:00
Makefile-selective-scan	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-vllm	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
README.md	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
poetry.lock	Mllama flash version (#2585 )	2024-10-02 11:22:13 +02:00
pyproject.toml	Mllama flash version (#2585 )	2024-10-02 11:22:13 +02:00
requirements_cuda.txt	Mllama flash version (#2585 )	2024-10-02 11:22:13 +02:00
requirements_intel.txt	Mllama flash version (#2585 )	2024-10-02 11:22:13 +02:00
requirements_rocm.txt	Mllama flash version (#2585 )	2024-10-02 11:22:13 +02:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev