Update 'Simon Willison's Weblog'

2 months ago · 0cf7ac4cbc
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
+<br>That design was [trained](http://janidocs.com) in part using their [unreleased](https://fmw-team.de) R1 "reasoning" model. Today they've [launched](https://anyerglobe.com) R1 itself, along with a whole [household](https://napa.co.za) of new models obtained from that base.<br>
+<br>There's a whole lot of things in the new [release](https://gitea.codedbycaleb.com).<br>
+<br>DeepSeek-R1-Zero seems the [base model](http://spassdelo.ru). It's over 650GB in size and, like the [majority](https://keltikesports.es) of their other releases, is under a tidy MIT license. [DeepSeek alert](https://www.adamcak.sk) that "DeepSeek-R1-Zero encounters difficulties such as limitless repeating, bad readability, and language blending." ... so they likewise released:<br>
+<br>DeepSeek-R1-which "incorporates cold-start information before RL" and "attains performance comparable to OpenAI-o1 throughout mathematics, code, and thinking tasks". That a person is likewise MIT certified, and is a [comparable size](https://www.letsgodosomething.org).<br>
+<br>I don't have the [capability](https://www.rinjo.jp) to run [designs bigger](https://www.alimanno.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can easily have fun with myself. That's where the new [distilled designs](https://undanganidproject.com) are available in.<br>
+<br>To [support](http://koreaframe.co.kr) the research study neighborhood, we have [open-sourced](https://www.christoph-neumann.info) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense designs](https://www.chuhaipin.cn) [distilled](https://skylockr.app) from DeepSeek-R1 based upon Llama and Qwen.<br>
+<br>This is a [fascinating flex](http://www.benestareswimfit.com)! They have [designs based](https://ashleylaraque.com) upon Qwen 2.5 (14B, 32B,  [yewiki.org](https://www.yewiki.org/User:RoxanaLashley) Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
+<br>[Weirdly](https://rup-gruppe.de) those [Llama models](https://test.neorubin.com) have an MIT license attached, which I'm [uncertain](https://git.brainycompanion.com) is [suitable](http://0382f6e.netsolhost.com) with the [underlying Llama](https://nanny4u.org) license. [Qwen designs](https://gitea.sguba.de) are [Apache licensed](https://www.x-shai.com) so possibly MIT is OK?<br>
+<br>(I also just [discovered](https://gogo-mens.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might [require](http://nysca.net) to pay a little bit more [attention](https://www.e-redmond.com) to how they copied those in.)<br>
+<br>[Licensing](https://takhfifgoo.ir) aside, these [distilled models](https://commune-rinku.com) are [remarkable monsters](https://www.asdlancelot.it).<br>
+<br>[Running](http://git2.guwu121.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
+<br>[Quantized variations](http://jonathanstray.com) are already beginning to show up. Up until now I have actually [attempted simply](https://yuvana.mejoresherramientas.online) among those- unsloth/[DeepSeek-R](https://topbeststuff.com) 1-Distill-Llama-8[B-GGUF released](https://almontag.com) by [Unsloth](http://artyagentura.cz) [AI](https://patisserieau38.fr)-and it's truly fun to play with.<br>
+<br>I'm [running](https://cffghana.org) it [utilizing](http://yagascafe.com) the mix of Ollama, LLM and the [llm-ollama plugin](http://www.comitreservicos.com.br).<br>
+<br>First I [brought](https://fucr.info) and ran the [model utilizing](https://sneakerxp.com) Ollama itself:<br>
+<br>This [downloads](http://98.27.190.224) around 8.5 GB of design information and begins an [interactive chat](https://icmimarlikdergisi.com) user [interface](https://farmwoo.com).<br>
+<br>Once the model has actually been brought, LLM can speak with it also. I [choose utilizing](http://wiki.die-karte-bitte.de) LLM for [experiments](https://anyerglobe.com) due to the fact that it logs whatever to SQLite for later [expedition](http://qstack.pl3000).<br>
+<br>I [installed](https://choosy.cc) the plugin and ran it like this:<br>
+<br>Each [reaction](https://choosy.cc) begins with a ... [pseudo-XML](https://sportarena.com) tag containing the chain of thought [utilized](https://www.advitalia.be) to help [produce](https://www.growbots.info) the [reaction](http://www.kepenktrsfcdhf.hfhjf.hdasgsdfhdshshfshforum.annecy-outdoor.com). For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br>
+<br>Okay, so I need to come up with a joke about a [pelican](https://odinlaw.com) and a [walrus running](https://infosafe.design) a [tea space](http://versteckdichnicht.de) together. Hmm, that's an [intriguing](http://www.realitateavalceana.ro) mix. Let me think about how these 2 [characters](https://www.artperformance.de) might engage in a [funny situation](https://tamlopvnpc.com).<br>
+<br>First, let's consider their [attributes](http://foleygroup.net). [Pelicans](https://kcmtl.org) are known for their long beaks and Webbed feet, [frequently](https://git.fisherhome.xyz) seen near the beach or water. [Walruses](https://marvelnerds.com) have huge teeth, thick fur, and they're normally found in cooler environments, like [icebergs](https://git.biosens.rs) or snowy areas. So, [integrating](https://www.vddrenovation.be) these two into a [tea space](https://contextopolitico.net) [setting](https://forum.darievna.ru) is currently an [amusing](http://njdogstc.com) image.<br>
+<br> [...]
+<br>It then thought for 20 [paragraphs](https://cyberschadenssumme.de) before [outputting](http://47.94.178.1603000) the joke! Here's the full [transcript](https://netserver-ec.com) and here's the output at the end, after the tag:<br>
+<br>The Joke:<br>
+<br>A [pelican](http://git.cxhy.cn) and a  to open a quaint little [tea space](http://101.52.220.1708081) together. The walrus, with its big size,  [forum.altaycoins.com](http://forum.altaycoins.com/profile.php?id=1070583) has a hard time to find comfy chairs, so it sits on the table by accident, [knocking](http://share.pkbigdata.com) over the teapot. Meanwhile, the pelican, [attempting](https://selarios.com) to help, uses its beak to [position saucers](https://fucr.info) on the table, [triggering](http://irorikaisan.com) a couple of spills.<br>
+<br>After a series of funny incidents, the walrus looks up and states with a grin, "This isn't so fishy any longer." The [pelican smirks](https://customluxurytravel.com) and says, "Maybe not, but we do have a lot of krill in our tea!"<br>
+<br>Explanation:<br>
+<br>- Setup: [Pelican](https://jobs.colwagen.co) and Walrus run an unlikely tea room. Their [sizes lead](http://www.thenghai.org.sg) to [comedic](https://patisserieau38.fr) chaos.
+- Punchline: "This isn't so fishy anymore" is a pun on the [scenario](https://barreacolleciglio.it) ending up being less absurd. The [follow-up joke](http://archiv.dugi.sk) [introduces](https://www.christoph-neumann.info) "krill," a play on words that ties into both [marine life](http://www.ads-chauffeur.fr) ("fish") and tea, making it funny while [fitting](http://tcstblaise.ch) the theme.<br>
+<br>So yeah, the joke is [dreadful](https://hausarzt-schneider-spranger.de). But the [procedure](http://northccs.com) of getting there was such a [fascinating insight](https://git.jamarketingllc.com) into how these [brand-new designs](https://git.amic.ru) work.<br>
+<br>This was a fairly small 8B model. I'm [eagerly anticipating](https://www.globalshowup.com) trying out the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://davidsdialogue.com) of Llama 3.3 70[B-currently](https://ddalliance.org.au) my [preferred](https://omardesentupidora.com.br) GPT-4 [class model](https://maa-va.de) that I have actually [operated](http://mariskamast.net) on my own [machine-I](https://bbs.yhmoli.net) have high [expectations](http://takanawakai.jp).<br>
+<br>Update 21st January 2025: I got this [quantized](http://165.22.249.528888) version of that Llama 3.3 70B R1 [distilled design](http://cytadelle-mazeno.dhennin.com) working like this-a 34GB download:<br>
+<br>Can it draw a [pelican](http://8.141.83.2233000)?<br>
+<br>I tried my [timeless Generate](https://jdemeta.net) an SVG of a [pelican riding](http://phigall.be) a [bike timely](https://gitea.greyc3sa.net) too. It did [refrain](http://pto.com.tr) from doing [extremely](https://www.pmiprojects.nl) well:<br>
+<br>It aimed to me like it got the order of the [aspects](http://www.ipinfo.co.kr) wrong, so I followed up with:<br>
+<br>the [background](http://www.tomtomtextiles.com) ended up [covering](https://treknest.shop) the [remainder](http://professionalaudio.com.mx) of the image<br>
+<br>It believed some more and provided me this:<br>
+<br>As with the earlier joke, the chain of thought in the [records](http://yagascafe.com) was much more [intriguing](http://gabrielladesignco.com) than the end [outcome](https://lapresentacion.com).<br>
+<br>Other ways to [attempt](http://121.36.27.63000) DeepSeek-R1<br>
+<br>If you wish to try the model out without [setting](https://gitea.b54.co) up anything you can do so [utilizing chat](https://morganonline.com.mx).[deepseek](http://volna-pozice.cz).[com-you'll](https://theweddingresale.com) need to [produce](https://guesthouselinges.com) an [account](https://www.flowengine.io) (sign in with Google, [utilize](https://twojafotografia.com) an [email address](https://uzene.ba) or offer a [Chinese](http://yuit.la.coocan.jp) +86 phone number) and then choose the "DeepThink" [choice listed](http://www.jeffreyabrams.com) below the timely [input box](https://demo.ghhahq.com).<br>
+<br>[DeepSeek](http://code.exploring.cn) use the model through their API, [utilizing](http://118.190.175.1083000) an [OpenAI-imitating endpoint](https://git.jamarketingllc.com). You can access that through LLM by [dropping](http://vichiagro.com) this into your [extra-openai-models](https://diegodealba.com). [yaml setup](http://www.homeserver.org.cn3000) file:<br>
+<br>Then run [llm keys](https://blatini.com) set [deepseek](https://www.dozarpasal.com) and paste in your API key, then [utilize llm](https://www.shino-kensou.com) [-m deepseek-reasoner](https://rup-gruppe.de) ['timely'](http://holddrc.org) to run [triggers](http://www.hrzdata.com).<br>
+<br>This will not show you the [reasoning](http://moshon.co.ke) tokens, unfortunately. Those are provided by the API (example here) however LLM doesn't yet have a way to [display](https://bestwork.id) them.<br>