Proton's biased article on Deepseek

JOMusic@lemmy.ml · 9 months ago

Proton's biased article on Deepseek

Dyf_Tfh@lemmy.sdf.org · 9 months ago

Those are not deepseek R1. They are unrelated models like llama3 from Meta or Qwen from Alibaba “distilled” by deepseek.

This is a common method to smarten a smaller model from a larger one.

Ollama should have never labelled them deepseek:8B/32B. Way too many people misunderstood that.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 9 months ago

I’m running deepseek-r1:14b-qwen-distill-fp16 locally and it produces really good results I find. Like yeah it’s a reduced version of the online one, but it’s still far better than anything else I’ve tried running locally.

stink@lemmygrad.ml · 9 months ago

Its so cute when chinese is sprinkled in randomly hehe my little bilingual robot in my pc

morrowind@lemmy.ml · 9 months ago

Have you compared it with the regular qwen? It was sissy very good

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 9 months ago

The main difference is speed and memory usage. Qwen is a full-sized, high-parameter model while qwen-distill is a smaller model created using knowledge distillation to mimic qwen’s outputs. If you have the resources to run qwen fast then I’d just go with that.

morrowind@lemmy.ml · 9 months ago

I think you’re confusing the two. I’m talking about the regular qwen before it was finetuned by deep seek, not the regular deepseek

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 9 months ago

I haven’t actually used that one, but doesn’t the same point apply here too? The whole point of DeepSeek is in distillation that makes runtime requirements smaller.

morrowind@lemmy.ml · 9 months ago

No cause I was already running regular (non-deepseek) qwen 14B, admittedly a heavily quantized and uncensored version, so I was just curious if it would be any better

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 9 months ago

The big difference with R1 at least is that it is trained using reinforcement learning, and it gives you an explanation of how it arrived at a solution. I find the thinking bit is really fascinating cause it really reads like a train of thought. Here’s an example of what it outputs when I ask it to write a react component with a text input and a button:

<think> Alright, the user asked for a React component that includes a text input and a button. I need to write this in a way that’s clear and functional.

First, I’ll start by setting up the basic structure of a React component. I’ll import React and useState since we’ll need state management for the input value.

Next, I’ll create a functional component called TextInputButton. Inside this component, I’ll use the useState hook to manage the text input’s value. The initial state can be an empty string.

Then, I’ll define a handler function, handleChange, that updates the state whenever the user types into the input. This function will take the event object and set the new value using setTextInputValue.

I also need a function to handle the button click. For now, I’ll use handleClick with an alert message as a placeholder. The actual functionality can be added later based on requirements.

In the return statement, I’ll structure the JSX. I’ll include an input element of type text, binding its value to the state variable and attaching the handleChange function to the onChange event. This ensures that the state always reflects the current input value.

Below the input, I’ll add a button element. On clicking this button, it will trigger handleClick. I’ll also give it a class for styling purposes, like ‘button’ which can be styled with CSS if needed.

Finally, I’ll export the component so it can be used elsewhere in the application. I’ll make sure to include comments explaining each part for clarity. </think>

Here’s a simple React component with a text input and a button:

import React from 'react';
import { useState } from 'react';

function TextInputButton() {
  const [text, setText] = useState('');

  const handleChange = (event) => {
    setText(event.target.value);
  };

  const handleClick = () => {
    // Add your click handler logic here
    alert('Button clicked!');
  };

  return (
    <div>
      <input
        type="text"
        value={text}
        onChange={handleChange}
        placeholder="Enter text..."
      />
      <button onClick={handleClick}>Submit</button>
    </div>
  );
}

export default TextInputButton;

This component:

Uses useState to manage the input value
Has a handleChange function that updates the state when the user types
Has a handleClick function that currently just shows an alert (you can add your own logic here)
Renders an input field and a submit button

You can use this component in your application like:

<TextInputButton />

The input value is controlled through the text state variable, and you can access it using text. When the button is clicked, the handleClick function will execute whatever logic you need.

pcalau12i@lemmygrad.ml · edit-2 9 months ago

The 1.5B/7B/8B/13B/32B/70B models are all officially DeepSeek R1 models, that is what DeepSeek themselves refer to those models as. It is DeepSeek themselves who produced those models and released them to the public and gave them their names. And their names are correct, it is just factually false to say they are not DeepSeek R1 models. They are.

The “R1” in the name means “reasoning version one” because it does not just spit out an answer but reasons through it with an internal monologue. For example, here is a simple query I asked DeepSeek R1 13B:

Me: can all the planets in the solar system fit between the earth and the moon?

DeepSeek: Yes, all eight planets could theoretically be lined up along the line connecting Earth and the Moon without overlapping. The combined length of their diameters (approximately 379,011 km) is slightly less than the average Earth-Moon distance (about 384,400 km), allowing them to fit if placed consecutively with no required spacing.

However, on top of its answer, I can expand an option to see its internal monologue it went through before generating the answer, which you can find the internal monologue here because it’s too long to paste.

What makes these consumer-oriented models different is that that rather than being trained on raw data, they are trained on synthetic data from pre-existing models. That’s what the “Qwen” or “Llama” parts mean in the name. The 7B model is trained on synthetic data produced by Qwen, so it is effectively a compressed version of Qen. However, neither Qwen nor Llama can “reason,” they do not have an internal monologue.

This is why it is just incorrect to claim that something like DeepSeek R1 7B Qwen Distill has no relevance to DeepSeek R1 but is just a Qwen model. If it’s supposedly a Qwen model, why is it that it can do something that Qwen cannot do but only DeepSeek R1 can? It’s because, again, it is a DeepSeek R1 model, they add the R1 reasoning to it during the distillation process as part of its training. (I think they use the original R1 to produce the data related to the internal monologue which it is learns to copy.)

lily33@lemm.ee · 9 months ago

What makes these consumer-oriented models different is that that rather than being trained on raw data, they are trained on synthetic data from pre-existing models. That’s what the “Qwen” or “Llama” parts mean in the name. The 7B model is trained on synthetic data produced by Qwen, so it is effectively a compressed version of Qen. However, neither Qwen nor Llama can “reason,” they do not have an internal monologue.

You got that backwards. They’re other models - qwen or llama - fine-tuned on synthetic data generated by Deepseek-R1. Specifically, reasoning data, so that they can learn some of its reasoning ability.

But the base model - and so the base capability there - is that of the corresponding qwen or llama model. Calling them “Deepseek-R1-something” doesn’t change what they fundamentally are, it’s just marketing.