nexaai/nexa-sdk
**NexaSDK lets you build the smartest and fastest on-device AI with minimum energy.** It is a highly performant local inference framework that runs the latest multimodal AI models locally on NPU, GPU, and CPU - across Android, Windows, and Linux devices with a few lines of code.
π Recognized Milestones
- Qualcomm featured us 3 times in official blogs.
- Innovating Multimodal AI on Qualcomm Hexagon NPU. - First-ever Day-0 model support on Qualcomm Hexagon NPU for compute and mobile platforms, Auto and IoT. - A simple way to bring on-device AI to smartphones with Snapdragon
π Quick Start
| Platform | Links | | --------------- | ----------------------------------------------------------------------------------------- | | π₯οΈ CLI | Quick Start ο½ Docs | | π Python | Quick Start ο½ Docs | | π€ Android | Quick Start ο½ Docs | | π³ Linux Docker | Quick Start ο½ Docs |
π₯οΈ CLI
Download:
| Windows | Linux | | -------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | | arm64 (Qualcomm NPU) | arm64 | | x64 | x64 |
NPU Access Token (required for NPU models):
Note: Our previous token validation service has been deprecated. For any NPU usage, simply set the access token below β no additional registration or validation is needed.
For Windows:
$env:NEXA_TOKEN="key/eyJhY2NvdW50Ijp7ImlkIjoiNDI1Y2JiNWQtNjk1NC00NDYxLWJiOWMtYzhlZjBiY2JlYzA2In0sInByb2R1Y3QiOnsiaWQiOiJkYjI4ZTNmYy1mMjU4LTQ4ZTctYmNkYi0wZmE4YjRkYTJhNWYifSwicG9saWN5Ijp7ImlkIjoiMmYyOWQyMjctNDVkZS00MzQ3LTg0YTItMjUwNTYwMmEzYzMyIiwiZHVyYXRpb24iOjMxMTA0MDAwMH0sInVzZXIiOnsiaWQiOiI3MGE2YzA4NS1jYjc3LTQ3YmEtOWUxNC1lNjFjYTA2ZThmZjUiLCJlbWFpbCI6ImFsYW40QG5leGE0YWkuY29tIn0sImxpY2Vuc2UiOnsiaWQiOiI4OTlhZGQ2NS1lOTI2LTQ2M2ItODllNi0xMjc0NzM3ZjA1MzYiLCJjcmVhdGVkIjoiMjAyNS0wOS0wNlQwMDo1MzozNi4yMDNaIiwiZXhwaXJ5IjoiMjAzNS0xMi0zMVQyMzo1OTo1OS4wMDBaIn19.BXoUHIEzFMuuZbBT7RvsKO9nTi5950C6kHO64blF7XBnfKvZ6ClA8a55tmszI1ZWdngzpNFTzMM5PV5euuzMCA=="For Linux / Android adb shell:
export NEXA_TOKEN="key/eyJhY2NvdW50Ijp7ImlkIjoiNDI1Y2JiNWQtNjk1NC00NDYxLWJiOWMtYzhlZjBiY2JlYzA2In0sInByb2R1Y3QiOnsiaWQiOiJkYjI4ZTNmYy1mMjU4LTQ4ZTctYmNkYi0wZmE4YjRkYTJhNWYifSwicG9saWN5Ijp7ImlkIjoiMmYyOWQyMjctNDVkZS00MzQ3LTg0YTItMjUwNTYwMmEzYzMyIiwiZHVyYXRpb24iOjMxMTA0MDAwMH0sInVzZXIiOnsiaWQiOiI3MGE2YzA4NS1jYjc3LTQ3YmEtOWUxNC1lNjFjYTA2ZThmZjUiLCJlbWFpbCI6ImFsYW40QG5leGE4YWkuY29tIn0sImxpY2Vuc2UiOnsiaWQiOiI4OTlhZGQ2NS1lOTI2LTQ2M2ItODllNi0xMjc0NzM3ZjA1MzYiLCJjcmVhdGVkIjoiMjAyNS0wOS0wNlQwMDo1MzozNi4yMDNaIiwiZXhwaXJ5IjoiMjAzNS0xMi0zMVQyMzo1OTo1OS4wMDBaIn19.BXoUHIEzFMuuZbBT7RvsKO9nTi5950C6kHO64blF7XBnfKvZ6ClA8a55tmszI1ZWdngzpNFTzMM5PV5euuzMCA=="Run your first model:
# Chat with Qwen3
nexa infer ggml-org/Qwen3-1.7B-GGUF
# Multimodal: drag images into the CLI
nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF
# NPU (Windows arm64 with Snapdragon X Elite)
nexa infer NexaAI/OmniNeural-4B- Models: LLM, Multimodal, ASR, OCR, Rerank, Object Detection, Image Generation, Embedding
- Formats: GGUF, NEXA
- π CLI Reference Docs
π Python SDK
pip install nexaaifrom nexaai import LLM, GenerationConfig, ModelConfig, LlmChatMessage
llm = LLM.from_(model="NexaAI/Qwen3-0.6B-GGUF", config=ModelConfig())
conversation = [
LlmChatMessage(role="user", content="Hello, tell me a joke")
]
prompt = llm.apply_chat_template(conversation)
for token in llm.generate_stream(prompt, GenerationConfig(max_tokens=100)):
print(token, end="", flush=True)- Models: LLM, Multimodal, ASR, OCR, Rerank, Object Detection, Image Generation, Embedding
- Formats: GGUF, NEXA
- π Python SDK Docs
π€ Android SDK
Add to your app/AndroidManifest.xml
<application android:extractNativeLibs="true">Add to your build.gradle.kts:
dependencies {
implementation("ai.nexa:core:0.0.19")
}// Initialize SDK
NexaSdk.getInstance().init(this)
// Load and run model
VlmWrapper.builder()
.vlmCreateInput(VlmCreateInput(
model_name = "omni-neural",
model_path = "/data/data/your.app/files/models/OmniNeural-4B/files-1-1.nexa",
plugin_id = "npu",
config = ModelConfig()
))
.build()
.onSuccess { vlm ->
vlm.generateStreamFlow("Hello!", GenerationConfig()).collect { print(it) }
}- Requirements: Android minSdk 27, Qualcomm Snapdragon 8 Gen 4 Chip
- Models: LLM, Multimodal, ASR, OCR, Rerank, Embedding
- NPU Models: Supported Models
- π Android SDK Docs
π³ Linux Docker
docker pull nexa4ai/nexasdk:latest
export NEXA_TOKEN="your_token_here"
docker run --rm -it --privileged \
-e NEXA_TOKEN \
nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU- Requirements: Qualcomm Dragonwing IQ9, ARM64 systems
- Models: LLM, VLM, ASR, CV, Rerank, Embedding
- NPU Models: Supported Models
- π Linux Docker Docs
βοΈ Features & Comparisons
<div align="center">
| Features | NexaSDK | Ollama | llama.cpp | LM Studio | | ---------------------------------------- | ---------------------------------------------------------- | ---------- | ------------- | ------------- | | NPU support | β NPU-first | β | β | β | | Android SDK support | β NPU/GPU/CPU support | β οΈ | β οΈ | β | | Linux support (Docker image) | β | β | β | β | | Day-0 model support | β | β | β οΈ | β | | Full multimodality support | β Image, Audio, Text, Embedding, Rerank, ASR, TTS | β οΈ | β οΈ | β οΈ | | Cross-platform support | β Desktop, Mobile (Android), Automotive, IoT (Linux) | β οΈ | β οΈ | β οΈ | | One line of code to run | β | β | β οΈ | β | | OpenAI-compatible API + Function calling | β | β | β | β |
<p align="center" style="margin-top:14px"> <i> <b>Legend:</b> <span title="Full support">β Supported</span> | <span title="Partial or limited support">β οΈ Partial or limited support </span> | <span title="Not Supported">β No</span> </i> </p> </div>
π Acknowledgements
We would like to thank the following projects:
π License
NexaSDK uses a dual licensing model:
CPU/GPU Components
Licensed under Apache License 2.0.
NPU Components
- Personal Use: Free license key available from Nexa AI Model Hub. Each key activates 1 device for NPU usage.
- Commercial Use: Contact hello@nexa.ai for licensing.
π€ Contact & Community Support
Want more model support, backend support, device support or other features? We'd love to hear from you!
Feel free to submit an issue on our GitHub repository with your requests, suggestions, or feedback. Your input helps us prioritize what to build next.
Package Metadata
Repository: nexaai/nexa-sdk
Default branch: main
README: README.md