Rocket.Chat AI App Setup Guide
    • Dark
      Light
    • PDF

    Rocket.Chat AI App Setup Guide

    • Dark
      Light
    • PDF

    Article summary

    The Rocket.Chat AI app enables you to integrate open-source models like Llama3 with your own knowledge base (using Retrieval Augmented Generation) with your Rocket.Chat instance. This allows the LLMs to generate responses based on the context of the docs in the knowledge base, which helps generate more accurate responses. The Rocket.Chat AI app provides features like thread and omnichannel conversation summarization.

    This document guides you through the LLM deployment and app configuration steps.

    The project is in beta, and we are working on improving the setup process. If you face any issues or have any feedback, please reach out to us on the Rocket.Chat AI channel.

    Here is a high-level overview of the app workflow:

    Prerequisites

    • Docker

    • An instance with a GPU

    • NVIDIA GPU(s) with CUDA support

    • CPU: x86_64 architecture

    • OS: Any Linux distros that are supported by the NVIDIA Container toolkit and have glibc >= 2.35 (see the output of ld -v).

    • CUDA drivers: Follow the installation guide here. Note that we only support the cuda and versions 12.5, 12.2 and 12.1. Please upgrade or downgrade to the supported versions if you have a different version. Otherwise, you can reach out to the Rocket.Chat team for feedback on supporting the version you have.

    • NVIDIA Container Toolkit

    • Docker with GPU support: To test if the GPU is accessible in the docker container, follow the steps listed in the Compose GPU Support.

    • Rocket.Chat License (Starter, Pro, or Enterprise). The Starter license is free for small teams. For more information, refer to our plans.

    Recommended hardware specifications

    Test on an AWS EC2 instance with the following configuration:

    • Instance type: g5.2xlarge

    • vCPUs: 8

    • Memory: 32 GB

    • GPU: NVIDIA A10G

    • VRAM: 24 GB

    • Storage: 450 GB

    The minimum requirements are as follows:

    • vCPUs: 4

    • Memory: 12 GB

    • GPU VRAM:

      • For the Llama3-8B model: 8 GB

      • For the Llama3-70B model: 40 GB

    • Storage:

      • For the Llama3-8B model: 100 GB

      • For the Llama3-70B model: 500 GB

    Installation steps

    Access the sample files from the Rocket.Chat AI app setup repository.

    1. Clone or init the repository if you are using https:

    git clone https://github.com/RocketChat/Rocket.Chat.AI.Preview.git

    For a zip download, unzip using the following command:

    unzip Rocket.Chat.AI.Preview-main.zip -d Rocket.Chat.AI.Preview
    1. Change the directory:

    cd Rocket.Chat.AI.Preview

    RAG pipeline steps (Rubra.AI)

    Modifying the configuration files is optional. The pipeline will work without modifications, but certain features will be disabled. For more information, see the About the config files section.

    1. Start Rubra using the following command:

    docker-compose -f docker-compose.yaml --profile rubra up -d

    If you're using a newer version of Docker (Docker CLI versions 1.27.0 and newer), you may need to use the following command:

    docker compose -f docker-compose.yaml --profile rubra up -d
    1. To verify that every service is running, run the following command:

    docker ps --format "{{.Names}}"

    It should return the following services:

    ui
    api-server
    task-executor
    vector-db-api
    milvus
    milvus-minio
    mongodb
    litellm
    milvus-etcd
    text-embedding-api
    redis

    In case you don't see any of the services listed above, reach out to the Rocket.Chat team for support.

    If everything is running, you can now access the Rubra UI at http://localhost:8501. Now, move on to the next step to start the LLM service.

    localhost is the default hostname. If you are using a different hostname, replace localhost with your hostname.

    Deploy LLM

    We support two methods to run LLM, one with Docker and the other with Helm. For the Docker method, follow the steps below. For scaling and production use cases, we recommend using our optimized Docker and Helm deployments. For access, reach out to us at [email protected].

    1. Run the following command to check the CUDA version on your machine:

    nvidia-smi

    The output should look like this:

    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA A10G                    Off |   00000000:00:1E.0 Off |                    0 |
    |  0%   31C    P0             58W /  300W |       1MiB /  23028MiB |      5%      Default |
    |                                         |                        |                  N/A |
    +-----------------------------------------+------------------------+----------------------+
    
    +-----------------------------------------------------------------------------------------+
    | Processes:                                                                              |
    |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
    |        ID   ID                                                               Usage      |
    |=========================================================================================|
    |  No running processes found                                                             |
    +-----------------------------------------------------------------------------------------+
    1. Additionally, to confirm the version of the CUDA compiler driver, use the following command:

    nvcc --version

    The output should look like this:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2024 NVIDIA Corporation
    Built on Wed_Apr_17_19:19:55_PDT_2024
    Cuda compilation tools, release 12.5, V12.5.40
    Build cuda_12.5.r12.5/compiler.34177558_0

    In this example, the CUDA version is 12.5 and the CUDA compilation tools version is also 12.5. If the versions are different, it is recommended that you align your system's CUDA version with the supported versions by either upgrading or downgrading.

    You can use this version (by removing the dot .) to set the PLATFORM_TAG value in the .env file. For example, if the CUDA Version is 12.5, the PLATFORM_TAG value is cuda125.

    The supported CUDA versions are 12.5, 12.2 and 12.1. If your system's version does not match any of the supported versions, consider updating your CUDA installation. Alternatively, for assistance with unsupported versions, contact the Rocket.Chat team for guidance on compatibility.

    1. Start with defining the environment variables in the .env file. You can copy the .env.llm.example file from the repository and rename it to .env. Then modify the following variables as required:

    # For the model weights
    MODEL_NAME=Llama-3-8B-Instruct-q4f16_1-MLC
    
    # For the MLC library
    PLATFORM_TAG=cuda125
    RELEASE=0.0.1
    1. Run the LLM container using the following command:

    docker-compose -f docker-compose.yaml --profile mlc-llm up -d

    If you're using a newer version of Docker (Docker CLI versions 1.27.0 and newer), you may need to use the following command:

    docker compose -f docker-compose.yaml --profile mlc-llm up -d
    1. Once the Docker container is running, you can call the LLM API using the following command:

    curl -X POST \
      -H "Content-Type: application/json" \
      -d '{
            "model": "Llama-3-8B-Instruct-q4f16_1-MLC",
            "messages": [
                {"role": "user", "content": "Hello! Our project is MLC LLM. What is the name of our project?"}
            ]
      }' \
      http://localhost:1234/v1/chat/completions

    localhost and 1234 are the default hostname and port, respectively. If you are using a different hostname and port, replace localhost and 1234 with your hostname and port.

    If you get a response, the LLM service is running.

    Great! You have successfully set up Rocket.Chat AI on your local machine. You can now integrate it with your Rocket.Chat instance.

    Integrate with Rocket.Chat

    Ensure that you replace the service names with the actual hostname and port if you are using a different hostname and port.

    1. Go to your Rocket.Chat instance.

    2. Install the Rocket.Chat AI app from the marketplace. You can find the app by searching for Rocket.Chat AI under Administration > Marketplace > Explore. It's a premium app, so you need to have a valid license (Starter, Pro, or Enterprise) to install the app.

    3. After installing the app, go to the Rocket.Chat AI app Settings page.

    4. Enter the Model URL with the LLM API URL. For example, http://llama3-8b:1234/v1 (don't include the /chat/completions part).

    5. For the Vector database URL setting, enter the milvus service URL with port 19530. For example, http://milvus:19530.

    6. For the Text embedding API URL setting, enter the text-embedding-api service URL with port 8020. For example, http://text-embedding-api:8020/embed_multiple.

    7. To set up the knowledge base, refer to the knowledge base setup video. For details about the other settings, see Configure the app.

    8. For the Vector database collection setting, you have two options:

      1. Call the endpoint http://api-server:8000/assistants and search for the assistant you want to integrate with. An example response looks like:

        {
          "object": "list",
          "data": [
            {
              "_id": {},
              "id": "asst_226796",
              "object": "assistant",
              "created_at": 1718975287,
              "name": "Demo Assistant",
              "description": "An assistant for RAG",
              "model": "custom",
              "instructions": "You are a helpful assistant",
              "tools": [
                {
                  "type": "retrieval"
                }
              ],
              "file_ids": ["file_0cff17", "file_9b02be"],
              "metadata": {}
            }
          ],
          "first_id": "asst_226796",
          "last_id": "asst_226796",
          "has_more": false
        }

        Now copy the assistant ID you want to integrate with the Rocket.Chat AI, from the field id in the example, we have the value asst_226796. Once copied, enter the value in the Rocket.Chat AI app settings page in the field Vector database collection setting.

      2. You can directly enter http://api-server:8000?name=Demo Assistant in the field Vector database collection in the Rocket.Chat AI app settings page. If the assistant exists, it will automatically fetch the assistant and replace the settings with asst_XYZ where XYZ is the assistant ID. If the field didn't change, it means the assistant doesn't exist, or there is an issue with the API.

        http://api-server:8000 is the default hostname and port. If you are using a different hostname and port, replace http://api-server:8000 with your hostname and port.

    Once you have integrated the Rocket.Chat AI with your Rocket.Chat instance, you can start using the AI features in your Rocket.Chat instance. For details, see Use the Rocket.Chat AI app.

    Bring your own Milvus vector database

    If you have your own Milvus vector database, you can use it with the Rocket.Chat AI. Follow the steps below to integrate your Milvus vector database with the Rocket.Chat AI:

    1. Go to your Rocket.Chat instance.

    2. Install the Rocket.Chat AI app from the marketplace. You can find the app by searching for Rocket.Chat AI under Administration > Marketplace > Explore. It's a premium app, so you need to have a valid license (Starter, Pro, or Enterprise) to install the app.

    3. After installing the app, go to the Rocket.Chat AI app Settings page.

    4. Enter the Model URL with the LLM API URL. For example, http://llama3-8b:1234/v1 (don't include the /chat/completions part).

    5. For the Vector database URL setting, enter the milvus service URL with port 19530. For example, http://milvus:19530.

    6. Enter your API key in the Vector database API key setting.

    7. In the Vector database text field setting, enter the text field where the text data is stored in the collection schema.

    8. Enter your embedding model (used when ingesting the data) in the Embedding model URL setting.

    Make sure your Embedding Model URL follows a certain format for request payload and response, such as:

    // Input
    {
        [
            "text1", "text2", ...
        ]
    }
    // Output
    {
        "embeddings": [
                [0.1, 0.2, 0.3, ...],
                [0.4, 0.5, 0.6, ...]
    
        ]
    }

    About the config files

    • llm-config.yaml: This file contains the configuration for Rubra AI's LLM service. You can modify it according to your requirements.

    OPENAI_API_KEY: sk-....X0FUz2bhgyRW32qF1 # OpenAI API key - Enables the use of OpenAI models in Rubra AI
      REDIS_HOST: redis # Redis host
      REDIS_PASSWORD: "" # Redis password
      REDIS_PORT: "6379" # Redis port
    
    model_list:
      - litellm_params:
          api_base: http://host.docker.internal:1234/v1 # LLM API base URL
          api_key: None # LLM API key
          custom_llm_provider: openai # Don't change this for custom models
          model: openai/custom # Model name - must be in the format openai/custom
        model_name: custom
    • milvus.yaml: This file contains the configuration for the Milvus service of Rubra AI. For more information, refer to the Milvus documentation.

    • Once the files are modified, you need to restart the services for the changes to take effect. Use the following command:

    docker-compose -f docker-compose.yaml --profile rubra restart

    Troubleshooting

    • If you get the following error:

     ✘ text-embedding-api Error context can...                           0.1s
     ✘ ui Error                 Head "https://ghcr.io/v2/ru...           0.1s
     ✘ task-executor Error      context canceled                         0.1s
     ✘ api-server Error         context canceled                         0.1s
     ✘ vector-db-api Error      context canceled                         0.1s
    Error response from daemon: Head "https://ghcr.io/v2/rubra-ai/rubra/ui/manifests/main": denied: denied

    Make sure you have logged in with the GitHub Container Registry (ghcr.io) using the following command:

    echo $CR_PAT | docker login ghcr.io -u YOUR_GITHUB_USERNAME --password-stdin

    Replace $CR_PAT with your personal access token (PAT) and YOUR_GITHUB_USERNAME with your GitHub username.

    • If you get the following error:

    TVMError: after determining tmp storage requirements for inclusive_scan: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device

    This error occurs when the NVIDIA GPU architecture is less than the sm_80. Refer to this website for the supported GPU with architectures sm_80 and above.

    Once your deployment and app configuration is successful, go to the Rocket.Chat AI App guide to learn how to install and use the app. You can also find details about other app configuration settings.


    Was this article helpful?

    What's Next
    ESC

    Eddy AI, facilitating knowledge discovery through conversational intelligence