About
PendriveGPT is a portable, self-contained artificial intelligence environment. It enables the execution of Large Language Models (LLMs) directly from a high-speed USB flash drive.
The primary objective is the provision of advanced AI capabilities with absolute data privacy, zero latency dependency on external networks, and hardware portability.
How does PendriveGPT work?
The system operates through three primary components:
- Hardware: High-speed NVMe-based USB interface facilitates rapid model weight transfer to system RAM.
- Inference Engine: Utilizes a compiled C/C++ engine (llama.cpp) to execute tensor calculations directly on the host CPU. No dedicated GPU is required.
- Model Weights: Employs quantized GGUF format files. This compression technique reduces memory footprint while preserving statistical accuracy.
Memory and Conversation History
Does PendriveGPT remember my previous conversations?
No. PendriveGPT does not retain historical conversation data after session termination.
How does PendriveGPT memory work?
Memory operates exclusively within the volatile RAM of the active browser tab. The system stores dialogue turns in a temporary data array. This array is transmitted to the local server during each request to maintain context. If you plug PendriveGPT into another device, it'll start clean.
Why is that?
This architecture guarantees absolute privacy. Execution of the reset command initiates an immediate data purge. The elimination of local database files prevents unauthorized access to user interactions.
System Attributes: Safe, Offline, Portable, Free
- 100% Safe: Zero telemetry or data extraction. Information processing occurs exclusively on the host hardware. External data transmission is disabled.
- Offline: The server process binds to the local loopback address (`localhost`). Internet connectivity is not required for inference generation.
- Portable: The engine binaries, HTML user interface, and model weights reside entirely on the USB storage device. Host operating system installation is unnecessary.
- Free: The architecture utilizes open-source software and open-weights models. Operation incurs zero subscription fees or API costs.
AI Model and Licensing
What LLM model runs PendriveGPT?
The default configuration utilizes the Meta Llama 3.1 8B Instruct model. Model weights are quantized to 4-bit format (GGUF) for optimal execution on consumer-grade hardware.
What kind of license does it allow?
The Meta Llama 3.1 Community License governs usage. This license permits research and commercial application. Commercial use is restricted only if monthly active users exceed 700 million.
System Requirements
Hardware specifications for optimal inference generation:
- Processor (CPU): Multi-core processor with AVX2 instruction set support.
- Memory (RAM): 16 GB system memory recommended.
- Interface: USB-C 3.2 Gen 2 (10 Gbps) or Thunderbolt 3/4 port.
- Operating System: Windows 10/11 (64-bit) or macOS (11.0 or newer).
FAQ (Frequently Asked Questions)
How does an offline AI run on a USB drive?
It utilizes quantized GGUF neural network weights and a localized inference engine executing directly on the host machine's RAM without internet connectivity.
Is data private with an air-gapped LLM?
Yes. Zero telemetry is guaranteed. Prompts and documents remain strictly within the local computing environment.
Can the neural network model be updated?
Advanced procedure with risk of system failure. Not recommended for standard users. Upgrading requires replacement of the .gguf file in the /models directory and manual modification of the launcher script parameters.
What operating systems are compatible with PendriveGPT?
Compatible with Windows (10/11), macOS (Apple Silicon M-series and Intel), and Debian-based Linux distributions. No driver installation required.
Do I need a dedicated GPU to run this offline AI?
No dedicated graphics processing unit (GPU) is required. The system is optimized for CPU inference, utilizing the host machine's standard processor and RAM.
How much RAM is required on the host computer?
A minimum of 8 GB of RAM is required for stable execution of the quantized language models. 16 GB is optimal for increased token generation speed.
Does the USB drive store my chat history?
No. Inference occurs in the volatile RAM of the host machine. Disconnection of the drive permanently deletes the session context. Zero persistent storage of prompts.
Does it work on mobile phones or tablets?
The architecture is designed for desktop operating systems. Incompatible with iOS and Android due to local binary execution restrictions on mobile platforms.
Is it possible to install an AI on a pendrive?
Yes. The installation requires a portable inference engine and a quantized language model file stored on a high-speed USB flash drive.
How to install an AI on a pendrive?
Download a compiled binary of an inference engine. Download a compatible quantized model. Place both in the USB drive directory. Create an execution script to launch the engine with the model file as a parameter. PendriveGPT automates this integration.
How does a portable AI work?
A portable AI executes neural network calculations using the host computer's CPU and RAM. The USB drive acts exclusively as the storage medium for the engine binaries and model weights.
Why does the system fan speed increase during use?
Inference generation requires intensive mathematical calculation. High CPU utilization generates thermal output, prompting active cooling mechanisms.