This skill covers using LM Studio as the embedding provider for GrepAI, offering a user-friendly GUI for managing local models.
When to Use This Skill
-
Want local embeddings with a graphical interface
-
Already using LM Studio for other AI tasks
-
Prefer visual model management over CLI
-
Need to easily switch between models
What is LM Studio?
LM Studio is a desktop application for running local LLMs with:
-
🖥️ Graphical user interface
-
📦 Easy model downloading
-
🔌 OpenAI-compatible API
-
🔒 100% private, local processing
Prerequisites
-
Download LM Studio from lmstudio.ai
-
Install and launch the application
-
Download an embedding model
Installation
Step 1: Download LM Studio
Visit lmstudio.ai and download for your platform:
-
macOS (Intel or Apple Silicon)
-
Windows
-
Linux
Step 2: Launch and Download a Model
-
Open LM Studio
-
Go to the Search tab
-
Search for an embedding model:
nomic-embed-text-v1.5
-
bge-small-en-v1.5 -
bge-large-en-v1.5 -
Click Download
Step 3: Start the Local Server
-
Go to the Local Server tab
-
Select your embedding model
-
Click Start Server
-
Note the endpoint (default:
http://localhost:1234)
Configuration
Basic Configuration
# .grepai/config.yaml
embedder:
provider: lmstudio
model: nomic-embed-text-v1.5
endpoint: http://localhost:1234
With Custom Port
embedder:
provider: lmstudio
model: nomic-embed-text-v1.5
endpoint: http://localhost:8080
With Explicit Dimensions
embedder:
provider: lmstudio
model: nomic-embed-text-v1.5
endpoint: http://localhost:1234
dimensions: 768
Available Models
nomic-embed-text-v1.5 (Recommended)
| Dimensions | 768
| Size | ~260 MB
| Quality | Excellent
| Speed | Fast
embedder:
provider: lmstudio
model: nomic-embed-text-v1.5
bge-small-en-v1.5
| Dimensions | 384
| Size | ~130 MB
| Quality | Good
| Speed | Very fast
Best for: Smaller codebases, faster indexing.
embedder:
provider: lmstudio
model: bge-small-en-v1.5
dimensions: 384
bge-large-en-v1.5
| Dimensions | 1024
| Size | ~1.3 GB
| Quality | Very high
| Speed | Slower
Best for: Maximum accuracy.
embedder:
provider: lmstudio
model: bge-large-en-v1.5
dimensions: 1024
Model Comparison
| bge-small-en-v1.5
| 384
| 130MB
| ⚡⚡⚡
| ⭐⭐⭐
| nomic-embed-text-v1.5
| 768
| 260MB
| ⚡⚡
| ⭐⭐⭐⭐
| bge-large-en-v1.5
| 1024
| 1.3GB
| ⚡
| ⭐⭐⭐⭐⭐
LM Studio Server Setup
Starting the Server
-
Open LM Studio
-
Navigate to Local Server tab (left sidebar)
-
Select an embedding model from the dropdown
-
Configure settings:
Port: 1234 (default)
-
Enable Embedding Endpoint
-
Click Start Server
Server Status
Look for the green indicator showing the server is running.
Verifying the Server
# Check server is responding
curl http://localhost:1234/v1/models
# Test embedding
curl http://localhost:1234/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "nomic-embed-text-v1.5",
"input": "function authenticate(user)"
}'
LM Studio Settings
Recommended Settings
In LM Studio's Local Server tab:
| Port | 1234
| Enable CORS | Yes
| Context Length | Auto
| GPU Layers | Max (for speed)
GPU Acceleration
LM Studio automatically uses:
-
macOS: Metal (Apple Silicon)
-
Windows/Linux: CUDA (NVIDIA)
Adjust GPU layers in settings for memory/speed balance.
Running LM Studio Headless
For server environments, LM Studio supports CLI mode:
# Start server without GUI (check LM Studio docs for exact syntax)
lmstudio server start --model nomic-embed-text-v1.5 --port 1234
Common Issues
❌ Problem: Connection refused ✅ Solution: Ensure LM Studio server is running:
-
Open LM Studio
-
Go to Local Server tab
-
Click Start Server
❌ Problem: Model not found ✅ Solution:
-
Download the model in LM Studio's Search tab
-
Select it in the Local Server dropdown
❌ Problem: Slow embedding generation ✅ Solutions:
-
Enable GPU acceleration in LM Studio settings
-
Use a smaller model (bge-small-en-v1.5)
-
Close other GPU-intensive applications
❌ Problem: Port already in use ✅ Solution: Change port in LM Studio settings:
embedder:
endpoint: http://localhost:8080 # Different port
❌ Problem: LM Studio closes and server stops ✅ Solution: Keep LM Studio running in the background, or consider using Ollama which runs as a system service
LM Studio vs Ollama
| GUI | ✅ Yes | ❌ CLI only
| System service | ❌ App must run | ✅ Background service
| Model management | ✅ Visual | ✅ CLI
| Ease of use | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐
| Server reliability | ⭐⭐⭐ | ⭐⭐⭐⭐⭐
Recommendation: Use LM Studio if you prefer a GUI, Ollama for always-on background service.
Migrating from LM Studio to Ollama
If you need a more reliable background service:
- Install Ollama:
brew install ollama
ollama serve &
ollama pull nomic-embed-text
- Update config:
embedder:
provider: ollama
model: nomic-embed-text
endpoint: http://localhost:11434
- Re-index:
rm .grepai/index.gob
grepai watch
Best Practices
-
Keep LM Studio running: Server stops when app closes
-
Use recommended model:
nomic-embed-text-v1.5for best balance -
Enable GPU: Faster embeddings with hardware acceleration
-
Check server before indexing: Ensure green status indicator
-
Consider Ollama for production: More reliable as background service
Output Format
Successful LM Studio configuration:
✅ LM Studio Embedding Provider Configured
Provider: LM Studio
Model: nomic-embed-text-v1.5
Endpoint: http://localhost:1234
Dimensions: 768 (auto-detected)
Status: Connected
Note: Keep LM Studio running for embeddings to work.