Commit Graph

812 Commits

Author SHA1 Message Date
fxmarty d3c7f63416 Merge branch 'main' into amd-ci-fx 2024-06-10 15:10:04 +02:00
fxmarty de6f2cd08d disable marlin tests on rocm/xpu 2024-06-10 13:06:11 +00:00
Daniël de Kok 85dfc39222
Add Phi-3 medium support (#2039)
Add support for Phi-3-medium

The main difference between the medium and mini models is that medium
uses grouped query attention with a packed QKV matrix. This change adds
support for GQA with packed matrixes to `Weights.get_weights_col_packed`
and uses it for Phi-3. This also allows us to remove the custom
implementation of GQA from dbrx attention loading.
2024-06-10 09:22:29 +02:00
fxmarty 9b3674d903
ROCm and sliding windows fixes (#2033)
* update vllm commit & fix models using sliding window

* update

* update commit

* fix bug where tunableop is bound to cuda graph even when cuda graph are disabled

* enable tunableop by default

* fix sliding window

* address review

* dead code

* precise comment

* is it flaky?
2024-06-10 15:09:50 +08:00
Nicolas Patry 41699e9bbf
. 2024-06-08 22:16:37 +02:00
Nicolas Patry eec6c3241b
. 2024-06-08 21:55:27 +02:00
Nicolas Patry 0ced5fac2d
Fix. 2024-06-08 08:58:05 +02:00
Nicolas Patry 452d442ef2
We need tailscale. 2024-06-08 08:46:55 +02:00
Nicolas Patry e62c51d140
Here we go again. 2024-06-08 08:41:40 +02:00
Nicolas Patry 8be9c197e5
Is this it ? 2024-06-08 07:54:00 +02:00
Nicolas Patry d9f704a1b3
Are we done ? 2024-06-08 07:53:21 +02:00
Nicolas Patry 909e6569d1
. 2024-06-08 07:40:08 +02:00
Nicolas Patry fa3e811672
No fromJSON. 2024-06-07 23:22:48 +02:00
Nicolas Patry 98d383062a
Extra spaces? 2024-06-07 23:15:58 +02:00
Nicolas Patry 66e59831f2
. 2024-06-07 23:00:27 +02:00
Nicolas Patry 741ab87fba
fromJSON 2024-06-07 22:58:28 +02:00
Nicolas Patry fc4404d9d2
. 2024-06-07 22:45:57 +02:00
Nicolas Patry 65b2efc585
. 2024-06-07 22:38:06 +02:00
Nicolas Patry eda299b84f
. 2024-06-07 20:18:57 +02:00
Nicolas Patry e79c83d7ba
Attempt #727. 2024-06-07 20:11:17 +02:00
Nicolas Patry c6fa9547a2
Test. 2024-06-07 19:58:56 +02:00
Nicolas Patry a045ead6eb
. 2024-06-07 19:52:14 +02:00
Nicolas Patry 5e769ce1e0
? 2024-06-07 19:46:34 +02:00
Nicolas Patry 87df3d5603
? 2024-06-07 17:12:17 +02:00
Nicolas Patry 19f6327bd2
esac. Great idea dev of the past. 2024-06-07 16:14:24 +02:00
Nicolas Patry 2a314fa0dd
Bash in bash. 2024-06-07 16:09:38 +02:00
Nicolas Patry b10ba9205c
... 2024-06-07 16:05:11 +02:00
Nicolas Patry 1f4248944c
Come on GH, dash, underscore, who cares at this point. 2024-06-07 16:03:05 +02:00
Nicolas Patry cc7c2fd90e
runs on. 2024-06-07 16:01:59 +02:00
Nicolas Patry 1e759f9da6
Wat? 2024-06-07 16:00:40 +02:00
Nicolas Patry 078fb55109
Abbé Faria? 2024-06-07 15:58:23 +02:00
Nicolas Patry 8205962950
Ahah, I see an exit. 2024-06-07 15:56:52 +02:00
Nicolas Patry 043de74dcd
**Feigns death** 2024-06-07 15:52:35 +02:00
Nicolas Patry 81ddb9d173
Please let me out ! 2024-06-07 15:49:31 +02:00
Nicolas Patry aea77a8ab3
Banana. 2024-06-07 15:44:51 +02:00
Nicolas Patry e6a4dbe7f5
I'm an certainly not a monkey. 2024-06-07 15:43:58 +02:00
Nicolas Patry a759e2e7c5
Not hitting myself against the wall. 2024-06-07 15:39:37 +02:00
Nicolas Patry 8712a367dc
Flying blind feels nice. 2024-06-07 15:36:13 +02:00
Nicolas Patry 6f3117512c
Give us sanitation tools already. 2024-06-07 15:25:43 +02:00
Nicolas Patry 54e3340663 gh.. 2024-06-07 15:09:27 +02:00
Nicolas Patry 11c75f3a14 I hate this. 2024-06-07 15:07:51 +02:00
Nicolas Patry 3a8e9c221e Rename for everyone. 2024-06-07 15:03:01 +02:00
Nicolas Patry f29371e587 Naming. 2024-06-07 14:49:48 +02:00
Nicolas Patry 3ee92eb614 ? 2024-06-07 14:15:45 +02:00
Nicolas Patry 3684439a0e Trying new split of tasks. 2024-06-07 12:03:22 +02:00
Nicolas Patry 9101b2ae4f Fix. 2024-06-07 10:05:51 +02:00
Nicolas Patry c73355b99c
Merge branch 'main' into ci_amd2 2024-06-07 10:04:59 +02:00
Nicolas Patry c8128c794d Let's iterate a bit faster. 2024-06-07 09:50:43 +02:00
Nicolas Patry 97af55b7ef Inject slugs 2024-06-07 09:10:38 +02:00
Daniël de Kok bf3c813782 server: use chunked inputs
The router will now send the input as chunks besides as a single
string. This change modifies the server to process chunked input
rather than strings. This also allows us to remove the image
extraction code from the server.
2024-06-07 08:09:04 +02:00