Setting Up a Distributed DeepSeek Cluster on Three Ubuntu Machines

futuristic data center with three interconnected Ubuntu servers

Overview

This article provides a comprehensive guide to setting up a distributed DeepSeek AI cluster on three Ubuntu machines. The goal is to enable these machines to interact with each other, distribute queries efficiently, and optimize computational performance.

Architecture Design

1. Core Setup Overview

Each machine will:

Run an instance of DeepSeek.
Communicate with others using REST APIs, WebSockets, or message brokers.
Share insights collaboratively and refine responses.

2. Roles of Each Machine

Machine 1 (Controller Node): Manages coordination, distributes queries, and aggregates responses.
Machines 2 & 3 (Worker Nodes): Process DeepSeek model inference and share results.

3. Communication Framework

Machines will use WebSockets for real-time, bidirectional communication.
Queries will be load-balanced across worker nodes to optimize performance.

Step-by-Step Deployment

1. Install Required Software

Run these commands on all three machines:

sudo apt update
sudo apt install python3 python3-pip
pip3 install flask flask-socketio eventlet requests

2. Install DeepSeek

Clone the DeepSeek repository and install dependencies:

git clone https://github.com/deepseek-ai/DeepSeek.git
cd DeepSeek
pip3 install -r requirements.txt

3. Set Up Controller Node (Machine 1)

This machine will receive user queries and distribute them to worker nodes.

controller.py:

from flask import Flask, request
from flask_socketio import SocketIO, emit
import requests

app = Flask(__name__)
socketio = SocketIO(app)

# Addresses of worker nodes
WORKERS = ["http://192.168.1.11:5000", "http://192.168.1.12:5000"]

@app.route('/')
def index():
    return "Controller Node Running."

@socketio.on('query')
def handle_query(data):
    question = data['question']
    print(f"Received question: {question}")
    
    # Distribute query to workers
    responses = []
    for worker in WORKERS:
        try:
            response = requests.post(worker + "/query", json={"question": question}).json()
            responses.append(response.get("answer", ""))
        except Exception as e:
            responses.append(f"Error contacting {worker}: {e}")
    
    # Aggregate responses
    final_response = "\n".join(responses)
    emit('response', {"final_response": final_response})

if __name__ == '__main__':
    socketio.run(app, host='0.0.0.0', port=5000)

4. Set Up Worker Nodes (Machines 2 & 3)

Each worker processes queries from the controller.

worker.py:

from flask import Flask, request, jsonify
from flask_socketio import SocketIO, emit

app = Flask(__name__)
socketio = SocketIO(app)

@app.route('/query', methods=['POST'])
def query():
    question = request.json.get('question', '')
    # Run DeepSeek model inference here
    response = f"DeepSeek response for: {question}"  # Mock response
    return jsonify({"answer": response})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

5. Implement Load Balancing

Modify controller.py to distribute queries evenly across worker nodes.

worker_index = 0

@socketio.on('query')
def handle_query(data):
    global worker_index
    question = data['question']
    worker = WORKERS[worker_index]
    worker_index = (worker_index + 1) % len(WORKERS)
    
    try:
        response = requests.post(worker + "/query", json={"question": question}).json()
        emit('response', {"final_response": response.get("answer", "")})
    except Exception as e:
        emit('response', {"final_response": f"Error: {e}"})

6. Add Consensus Mechanism

Enable worker nodes to vote on the best response.

Modify worker.py:

import random

@app.route('/query', methods=['POST'])
def query():
    question = request.json.get('question', '')
    response = f"DeepSeek response for: {question}"  # Mock response
    confidence = random.uniform(0.8, 1.0)  # Random confidence score
    return jsonify({"answer": response, "confidence": confidence})

Modify controller.py:

@socketio.on('query')
def handle_query(data):
    question = data['question']
    responses = []
    for worker in WORKERS:
        try:
            response = requests.post(worker + "/query", json={"question": question}).json()
            responses.append(response)
        except Exception as e:
            responses.append({"answer": f"Error: {e}", "confidence": 0.0})
    
    # Find the response with the highest confidence
    best_response = max(responses, key=lambda r: r["confidence"])
    emit('response', {"final_response": best_response["answer"]})

Optimizations for Performance

Use GPUs: Add NVIDIA GPUs to worker nodes for faster inference.
Enable Multi-Turn Dialogues: Store conversation history for context-aware interactions.
Monitor System Performance:
- Install Prometheus and Grafana to visualize resource usage.
Hybrid Cloud Integration: Connect the cluster to cloud-based GPU resources for scalability.
Distributed File System: Implement NFS or GlusterFS for shared storage among machines.

Learn More

DeepSeek-R1: An Open-Source Advanced Reasoning Model

https://huggingface.co/deepseek-ai/DeepSeek-R1

DeepSeek-R1 is an advanced reasoning model developed by DeepSeek, a Chinese AI company. It achieves performance comparable to OpenAI’s o1 model across tasks such as mathematics, coding, and reasoning. The model was trained using large-scale reinforcement learning (RL) without supervised fine-tuning (SFT), allowing it to naturally develop powerful reasoning behaviors. To support the research community, DeepSeek has open-sourced DeepSeek-R1 and its distilled versions based on Llama and Qwen architectures. These distilled models offer various parameter sizes, providing flexibility for different computational resources. The open-source nature of DeepSeek-R1 encourages further research and development in AI reasoning capabilities.

Get up and running with large language models.

https://ollama.com/library/deepseek-r1

DeepSeek’s first-generation reasoning models, achieving performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

Models

DeepSeek-R1

ollama run deepseek-r1:671b

Distilled models

DeepSeek team has demonstrated that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models.

There are smaller models created via fine-tuning against several dense models widely used in the research community using reasoning data generated by DeepSeek-R1. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks.

Conclusion

This setup enables three Ubuntu machines to work collaboratively, optimizing workload distribution and model inference efficiency. Future enhancements could include WebSocket-based real-time querying, fault tolerance mechanisms, and integration with cloud-based AI services.

Creating a WordPress Plugin That Seamlessly Integrates OpenAI, Notion, and Python Script Execution

The Fractal Mind – A Fiction Of The Cognisphere

Exploring the Intersection of Electromagnetic Fields, Piezoelectricity, and Time Travel Theories

The Sentient AI: Childlike Curiosity or Adult-like Rationality?