Openai

Post LinkedIn lead magnet · Openai

The LLM training recipe has changed DeepSeek-V3.2 was post-trained using 1.8k RL environments; Minimax M2.1 used over 100k environments... This reflects a shift: from learning on static data to learning through interaction 𝗕𝘂𝘁 𝗵𝗼𝘄 𝗱𝗶𝗱 𝘄𝗲 𝗴𝗲𝘁 𝘁𝗵𝗲𝗿𝗲? 1️⃣ Classic LLM training recipe (InstructGPT) - Pre-training on internet text → learn to create text completions - Supervised Fine-Tuning on Q/A pairs → learn new tasks and to follow instructions - Reinforcement Learning (PPO or DPO) → align with human preferences It worked, until it hit a ceiling. You might remember Ilya Sutskever's talk at NeurIPS 2024: "Pre-training as we know it will end" Data is finite and classic post-training (SFT, Preference Alignment) cannot make miracles. What's next? 2️⃣ OpenAI o1 series hinted at a new direction They showed that Reinforcement Learning can induce chain-of-thought reasoning, and that performance improves with more train-time or test-time compute. No details on how to get there... 3️⃣ DeepSeek-R1 showed a concrete approach Reasoning/COT improves performance but teaching it via SFT needs expensive curated data Instead, they used Reinforcement Learning with Verifiable Rewards: - the model generates reasoning + answer - answer is checked against ground truth - reward drives RL training The idea is more general Any task with a verifiable outcome (a won game, a passing test...) can become a training signal The model is no longer limited by the quality of examples like in SFT By trial and error, it can discover better reasoning strategies on its own. DeepSeek also introduced GRPO: instead of PPO's expensive/unstable setup, generate a group of responses, rank them, use relative performance as baseline. Simpler, lighter, works well with RLVR 4️⃣ The mapping from classic RL to LLMs The Language Model is the Agent, its response is the Action. The Environment is everything needed to check (and possibly train) the model on the task: data, harnesses, scoring rules. SFT relies on curated datasets. RLVR requires environments: dynamic systems the model can interact with.  And as LLMs gain access to tools (from APIs to terminals) these environments become more complex and more critical. As Karpathy puts it: > environments give the LLM an opportunity to actually interact - take actions, see outcomes, etc. > This means you can hope to do a lot better than statistical expert imitation --- 📖 For a deeper dive and resources, check the comments.

Mécanisme lead magnet

📖 For a deeper dive and resources, check the comments.

72 4×1.2

Autres lead magnets en openai

1

Openai

Post LinkedIn

Vidéo

1️⃣ Like + Republie ce post (pour la force 🤜 ) 2️⃣ Manifeste-toi dans les commentaires 💬 3️⃣ Je t'envoie le lien en MP 🎁

486 866 0×35.8
2

Openai

Post LinkedIn

Vidéo

𝗕𝗥𝗘𝗙....J'ai créé 𝗚𝗔𝗥𝗬, un nouveau GPT qui génère 𝗱𝗲𝘀 𝗮𝗿𝘁𝗶𝗰𝗹𝗲𝘀 𝗼𝗽𝘁𝗶𝗺𝗶𝘀é𝘀 𝗦𝗘𝗢, hyper quali ! 𝗘𝗡 𝟮 𝗠𝗢𝗧𝗦 : 1️⃣ Tu indiques à GARY le sujet de ton article de blog. 2️⃣. Tu lui partages ton brief ou un contenu source pour l'inspiration. 3️⃣. Gary te génère un article structuré, avec les H1, H2, prêt à l'emploi ! 𝗖'𝗘𝗦𝗧 𝗧𝗢𝗨𝗧 ! 🙂 -------------------- Petite précision : J'ai ajouté une "𝗦𝗘𝗖𝗥𝗘𝗧 𝗦𝗔𝗨𝗖𝗘" dans le prompt pour le rendre encore plus performant que les autres GPT du même style... 𝗩𝗼𝘂𝘀 𝗺𝗲 𝗱𝗶𝗿𝗲𝘇 𝘀𝗶 ç𝗮 𝗳𝗮𝗶𝘁 𝗹𝗮 𝗱𝗶𝗳𝗳 ! 😁 🎁 𝗣𝗢𝗨𝗥 𝗬 𝗔𝗖𝗖𝗘𝗗𝗘𝗥 : 1️⃣ LIKE + REPUBLIE CE POST ❤️ 2️⃣ COMMENTE "GARY" 3️⃣ AJOUTE-MOI À TES CONTACTS LINKEDIN (si pas déjà fait) 4️⃣ JE T'ENVOIE LE LIEN POUR TESTER EN MP ! ⚠️ 𝗗𝗜𝗦𝗖𝗟𝗔𝗜𝗠𝗘𝗥 : Il te faut un compte OpenAI payant pour y accéder ! ⚠️ 𝗗𝗜𝗦𝗖𝗟𝗔𝗜𝗠𝗘𝗥 2 : Il ne s'agit pas d'un GPT pouvant rivaliser avec des outils SEO avancés qui gérent la SERP Google, etc 😉 -------------------- 𝗣𝗦 : Je lance une offre pour te créer ton propre GPT, customisé en fonction de 𝗧𝗘𝗦 𝗕𝗘𝗦𝗢𝗜𝗡𝗦 / 𝗣𝗔𝗥𝗔𝗠È𝗧𝗥𝗘𝗦 spécifiques. Si t'es intéressé(e) : Manifeste-toi en MP 😉

1️⃣ LIKE + REPUBLIE CE POST ❤️ 2️⃣ COMMENTE "GARY" 3️⃣ AJOUTE-MOI À TES CONTACTS LINKEDIN (si pas déjà fait) 4️⃣ JE T'ENVOIE LE LIEN POUR TESTER EN MP !

157 238 0×11.6

Demander le retrait de ce post

LinkHub

LinkHub

Attire des clients qualifiés sur LinkedIn avec tes commentaires

LinkPost

LinkPost

Crée du contenu viral sur LinkedIn de façon scientifique

LinkEarn

LinkEarn

Attire des clients en illimité grâce à LinkedIn - sans y passer des heures.

LinkMagnet

LinkMagnet

Distribue tes lead magnets automatiquement sur LinkedIn