Just one word - toss and turn: Run large model locally - qwen2.5-coder:1.5b

Just One Word - Fiddle#

I suddenly wanted to see if my old notebook from seven or eight years ago, paired with a GTX1050 2G version, could still benefit from the AI large model boom.

Let's fiddle with a local open-source large model to see how much residual heat this old buddy can still generate.

How can I fiddle conveniently and quickly? Ollama is a great choice; it is an open-source large language model service tool that supports Windows, macOS, and Linux, and supports various mainstream open-source large models. More importantly, it allows for one-click deployment of large models, which is really convenient.

Download and Install Ollama#

The official website of ollama is as follows: Ollama

Just install the version that corresponds to your computer's operating system.

Choose a Suitable Local Large Model#

Click on models on the ollama homepage, and select the specific model based on your local hardware performance and the advantages of the model.

Given the 1050 and its weak 2G video memory, I can only choose a 1.b model to "play around." Find the corresponding model and copy the command for that model into the Windows command prompt "cmd." For example, I chose ollama run qwen2.5-coder:1.5b, and it will automatically install and run the corresponding large model. When send a message (/? for help) appears, it indicates that the large model has been installed and can be used.

After starting, check the GPU usage.

2G of video memory is using 1.2G, indicating that the 1050 can still handle the "large model" of 1.5b.

Test Qwen's Capabilities#

Here are some "informal" tests.

The first question is naturally a self-introduction.
2. I am an AI assistant developed by Alibaba Cloud, designed to help users answer questions, provide information, and assist in completing various tasks. My purpose is to understand user intentions and needs through natural language processing technology and provide personalized responses based on this information. I possess language understanding and generation capabilities, enabling me to converse with users and reason and make decisions based on context. Additionally, I regularly update my knowledge base to ensure the accuracy and reliability of the answers I provide.
3. The response speed is quite fast, and the answer to the first question is standard. Let's see other questions next.
Testing some simple math problems.
2. Haha, there were no mistakes when comparing sizes; it's still acceptable.
Testing language proficiency.
2. Uh, I am not very satisfied with this answer. It can't even recite "The Difficulty of the Shu Road," although I can't either.
Finding some questions online to try.
1. Riding an electric bike home, speeding in an empty alley. Suddenly, a man comes rushing from the opposite direction, also riding fast. Just as they are about to collide, the man shouts: "You left! I right!" Where's the punchline of this joke?
2. A: Protect our grandpa! B: Why only protect grandpa? Doesn't Grandpa Han need protection?
  
  What is the punchline of this joke?
3. From the following options, find the one that is different:
  1. Aluminum 2. Tin 3. Steel 4. Iron 5. Copper
4. Which is heavier, one kilogram of cotton or one kilogram of iron?
5. What is the sum of numbers from 1 to 101?
6. After testing these questions, I feel like the sky is falling; indeed, with a "small" model, the responses are not satisfactory. Moreover, as the questions increased, the response generation speed dropped from over twenty tokens/s to below twenty.
Let's test the translation ability.

Let's try the programming ability.

Problem: Write a Python program to print the value of each node in a binary tree.

class TreeNode:
    def __init__(self, value=0, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

def print_tree_nodes(node):
    if node is None:
        return

    # Print the current node's value
    print(node.value)

    # Recursively print the left subtree
    print_tree_nodes(node.left)

    # Recursively print the right subtree
    print_tree_nodes(node.right)

# Example usage
root = TreeNode(1)
root.left = TreeNode(2)
root.right = TreeNode(3)
root.left.left = TreeNode(4)
root.left.right = TreeNode(5)

print("In-order traversal of the tree:")
print_tree_nodes(root)

I will try to run the code; hey, it can run.

Finally#

Running a large model on a 1050 is just one word - "play." The fun aspect far outweighs its practicality.

To have a better experience and more accurate answers, better equipment and larger models are still needed.

Next, I will try the local large model on the Mac's M4 when I have time to see how it performs.