> For the complete documentation index, see [llms.txt](https://docs.inferix.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.inferix.io/inferix-whitepaper/decentralized-federated-ai/meta-llama.md).

# Meta LLaMA

In its GPU hardware segment, Inferix focuses on devices optimized for graphics rendering, with the RTX3090 and RTX4090 serving as the flagship devices.

The TensorOpera:registered: team has released public data on deploying pre-trained models like LLaMA-2 13B or LLaMA-3 7B parameters on the RTX4090. Notably, LLaMA-2 13B inference running on a single RTX4090 using TensorOpera’s ScaleLLM achieves 1.88 times lower latency compared to the same model running on a single A100 GPU using vLLM. For the LLaMA-3 7B, it can run with a token batch size of 256 on a single RTX4090, without additional memory optimization [\[23\]](/inferix-whitepaper/references.md#23).

In their introduction to ScaleLLM, the TensorOpera:registered: team claims that by utilizing this engine with the RTX4090, LLMs can operate with three times less memory, run 1.8 times faster, and be 20 times more cost-effective compared to using A100 GPUs in traditional data centers.

Research and experimental benchmarks have shown that we can *train larger LLMs on a larger number of distributed GPUs than in data centers with federated learning*, using Gradient Low-rank Projection (GaLore) [\[23\]](/inferix-whitepaper/references.md#23), [\[24\]](/inferix-whitepaper/references.md#24).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.inferix.io/inferix-whitepaper/decentralized-federated-ai/meta-llama.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
