The computer configuration is too poor, how to experience the charm of "local large models" for free.

Jan 30, 2025213

AI Translation

This post is translated from Chinese into English through AI.View Original

AI-generated summary

The document outlines how to utilize Tencent Cloud Studio to set up large AI models with a monthly free usage of 10,000 GPU minutes. It guides users to log in to the Tencent Cloud platform, select the desired model (using "olloma" as an example), and create a basic space. Users can check installed models using the command `ollama list`, with the default being `llama3:latest`. To install a specific model, such as `deepseek-r1:32b`, users should run `ollama pull deepseek-r1:32b` in the IDE terminal. A sample Python program is provided to interact with the model, demonstrating how to send a message and receive a response. The document concludes with a note that running a 32b model on a 16GB GPU may be challenging, suggesting trying a smaller 14b model instead.

Background#

==Free use of 10,000 minutes of GPU each month: Build large models in the cloud using Tencent Cloud Studio.==

Setting Up AI Space#

Open https://cloud.tencent.com/ and log in as prompted, then select the required model based on your situation. Next, take olloma as an example, check olloma as shown in the image, and then create a basic space.

Entering the IDE Environment#

Check the currently installed local large models through the terminal. Use the following command:

ollama list

The default installed model is: llama3:latest

Installing Required Local Large Models#

Log in to olloma official website, select the required large model, taking deepseek-r1:32b as an example, enter ollama pull deepseek-r1:32b in the IDE terminal, and wait for the model to download successfully.

Creating a Python Program to Start the Large Model Experience Journey#

Taking the following Python program as an example,

from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(
    model='deepseek-r1:32b',
    messages=[
        {'role': 'user', 'content': 'Who are you?'},
    ]
)

print(response['message']['content'])

The terminal output is as follows

Finally#

Testing found that running the 32b model with 16GB of VRAM is still somewhat difficult, so let's try downloading a 14b model instead...