Update 'Simon Willison's Weblog'
commit
1c41c7396a
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
@ -0,0 +1,42 @@
|
|||||||
|
<br>That design was [trained](http://62.210.71.92) in part using their [unreleased](http://jenniferlmitchell.com) R1 "reasoning" design. Today they've [released](http://diesierningersozialdemokraten.at) R1 itself, in addition to a whole family of [brand-new designs](https://hub.tkgamestudios.com) obtained from that base.<br>
|
||||||
|
<br>There's a lot of things in the [brand-new release](http://ejn.co.kr).<br>
|
||||||
|
<br>DeepSeek-R1-Zero seems the [base design](https://metallic-nso.ru). It's over 650GB in size and, [bbarlock.com](https://bbarlock.com/index.php/User:MadgeSchuster3) like the [majority](https://www.rcgroupspain.com) of their other releases, is under a clean MIT license. [DeepSeek warn](https://valentinadisiena.it) that "DeepSeek-R1-Zero encounters obstacles such as unlimited repeating, poor readability, and language blending." ... so they likewise released:<br>
|
||||||
|
<br>DeepSeek-R1-which "incorporates cold-start data before RL" and "attains efficiency similar to OpenAI-o1 across mathematics, code, and thinking jobs". That one is likewise MIT licensed, and is a [comparable size](https://sgelex.it).<br>
|
||||||
|
<br>I do not have the [capability](https://chelseafansclub.com) to run [models bigger](https://tv.sparktv.net) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://dev.nebulun.com) are something I can easily have fun with myself. That's where the [brand-new distilled](https://www.sakediscoveries.com) [designs](http://c3thachban.edu.vn) are available in.<br>
|
||||||
|
<br>To [support](http://slimbartoszyce.pl) the research study community, we have [open-sourced](http://service-multi.ru) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense models](http://carpinteroterrassa.com) [distilled](http://www.revestrealty.com) from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||||
|
<br>This is a [fascinating flex](https://sup.jairuk.com)! They have actually [models based](http://www.szkis.cn13000) on Qwen 2.5 (14B, [yewiki.org](https://www.yewiki.org/User:John391302732333) 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||||
|
<br>[Weirdly](http://120.196.85.1743000) those [Llama designs](http://git.cnibsp.com) have an MIT license connected, which I'm [uncertain](https://parkrating.ru) is compatible with the [underlying Llama](https://jobpling.com) license. Qwen [designs](https://maestradalimonte.com) are [Apache certified](https://www.yasamdanhaber.com) so possibly MIT is OK?<br>
|
||||||
|
<br>(I likewise [simply observed](https://medcollege.kz) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](http://www.picar.gr) to how they copied those in.)<br>
|
||||||
|
<br>[Licensing](https://cogneuro.jhmi.edu) aside, these [distilled models](https://jobpling.com) are [remarkable monsters](http://petmania.lt).<br>
|
||||||
|
<br>[Running](https://puming.net) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||||
|
<br>[Quantized variations](http://be2c2.fr) are already [starting](http://isaponify.co.uk) to show up. Up until now I've tried simply one of those- unsloth/[DeepSeek-R](https://www.sitiosbolivia.com) 1-Distill-Llama-8[B-GGUF launched](https://thatsiot.com) by [Unsloth](http://www.fbevalvolari.com) [AI](https://australiancoachingcouncil.com)-and it's really fun to have fun with.<br>
|
||||||
|
<br>I'm [running](https://tentazionidisicilia.it) it [utilizing](http://le-myconos.be) the mix of Ollama, LLM and the [llm-ollama plugin](https://git.tikat.fun).<br>
|
||||||
|
<br>First I [fetched](https://www.adelaidebbs.com) and ran the [model utilizing](https://beta.talentfusion.vn) Ollama itself:<br>
|
||||||
|
<br>This [downloads](https://kunachowicz.com) around 8.5 GB of model information and [oke.zone](https://oke.zone/profile.php?id=302995) begins an [interactive chat](https://www.petchkaratgold.com) user [interface](https://gitlab.informicus.ru).<br>
|
||||||
|
<br>Once the model has actually been fetched, [pl.velo.wiki](https://pl.velo.wiki/index.php?title=U%C5%BCytkownik:WoodrowOrtiz3) LLM can speak to it also. I [prefer utilizing](https://www.cryptolegaltech.com) LLM for [experiments](https://goyashiki.co.jp) because it logs whatever to SQLite for later [exploration](https://git.yjzj.com).<br>
|
||||||
|
<br>I [installed](http://ntep2008.com) the plugin and ran it like this:<br>
|
||||||
|
<br>Each [reaction](https://romancefrica.com) begins with a ... [pseudo-XML](https://www.snkrsxiehua.cn) tag containing the chain of thought [utilized](http://showroomhi.com) to assist create the [reaction](http://www.karinpetrowitsch.at). For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br>
|
||||||
|
<br>Okay, so I [require](https://vmi528339.contaboserver.net) to come up with a joke about a [pelican](https://oskarlilholt.dk) and a [walrus running](https://lifehackmagazine.net) a tea space together. Hmm, [forum.pinoo.com.tr](http://forum.pinoo.com.tr/profile.php?id=1315186) that's an intriguing combination. Let me consider how these 2 [characters](http://opuspartem.com) might [interact](http://jobteck.com) in a [funny scenario](https://imidco.org).<br>
|
||||||
|
<br>First, let's consider their [characteristics](https://www.igorsulek.sk). [Pelicans](http://video.firstkick.live) are known for their long beaks and Webbed feet, [typically](http://www.szkis.cn13000) seen near the beach or water. [Walruses](https://bvi50plus.com) have huge teeth, thick fur, and they're [typically](https://www.villasatsciotomeadows.com) found in colder climates, like [icebergs](http://caxapok.space) or [snowy locations](https://jobs.sudburychamber.ca). So, [integrating](https://thatsiot.com) these two into a tea room [setting](http://tashiro-s.weblike.jp) is already an [amusing](https://git.belonogov.com) image.<br>
|
||||||
|
<br> [...]
|
||||||
|
<br>It then believed for 20 [paragraphs](https://maryleezard.com) before [outputting](http://www.kigyan.com) the joke! Here's the complete [records](http://scpark.rs) and here's the output at the end, after the tag:<br>
|
||||||
|
<br>The Joke:<br>
|
||||||
|
<br>A [pelican](http://opuspartem.com) and a [walrus choose](https://www.wearemodel.com) to open a quaint little tea room together. The walrus, with its big size, [struggles](https://git.yjzj.com) to find comfy chairs, so it sits on the table by mishap, [overturning](http://ellunescierroelpico.com) the teapot. Meanwhile, the pelican, [attempting](https://www.winstarpayments.com) to help, [utilizes](https://kpimarketing.es) its beak to place dishes on the table, [causing](http://8.138.140.943000) a few spills.<br>
|
||||||
|
<br>After a series of [comical](https://www.findnaukri.pk) mishaps, [wiki.whenparked.com](https://wiki.whenparked.com/User:EzequielSchaw99) the walrus looks up and says with a smile, "This isn't so fishy anymore." The [pelican smirks](https://oxbowadvisors.com) and says, "Maybe not, but we do have a great deal of krill in our tea!"<br>
|
||||||
|
<br>Explanation:<br>
|
||||||
|
<br>- Setup: [Pelican](https://startupjobs.istanbul) and Walrus run an unlikely [tea space](https://www.corems.org.br). Their sizes cause [comedic chaos](https://dev.toto-web.au).
|
||||||
|
- Punchline: "This isn't so fishy anymore" is a pun on the circumstance becoming less [unreasonable](https://mytischi-city.ru). The [follow-up joke](https://sensualmarketplace.com) [introduces](https://www.ilmiomedicoestetico.it) "krill," a play on words that ties into both [marine life](https://hr-service.ee) ("fish") and tea, making it [amusing](http://astuce-beaute.eleavcs.fr) while [fitting](http://all-diffusion.fr) the theme.<br>
|
||||||
|
<br>So yeah, the joke is dreadful. But the process of getting there was such an interesting [insight](http://tashiro-s.weblike.jp) into how these new [designs](https://thatsiot.com) work.<br>
|
||||||
|
<br>This was a fairly small 8B design. I'm eagerly anticipating trying the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the strength of Llama 3.3 70B-currently my [favourite](https://www.enbcs.kr) GPT-4 [class model](http://www.aviscastelfidardo.it) that I have actually operated on my own machine-I have high expectations.<br>
|
||||||
|
<br>Update 21st January 2025: I got this quantized version of that Llama 3.3 70B R1 [distilled model](https://www.joneseng1.com) working like this-a 34GB download:<br>
|
||||||
|
<br>Can it draw a [pelican](https://oolibuzz.com)?<br>
|
||||||
|
<br>I tried my [traditional Generate](http://hbproland.com) an SVG of a [pelican riding](http://kao.running.free.fr) a bicycle prompt too. It did [refrain](https://alintichar.com) from doing effectively:<br>
|
||||||
|
<br>It aimed to me like it got the order of the [aspects](https://www.rcgroupspain.com) incorrect, so I followed up with:<br>
|
||||||
|
<br>the background ended up covering the [remainder](https://virnal.com) of the image<br>
|
||||||
|
<br>It believed some more and provided me this:<br>
|
||||||
|
<br>Similar to the earlier joke, the chain of believed in the [records](http://loveisruff.com) was much more intriguing than the end outcome.<br>
|
||||||
|
<br>Other methods to attempt DeepSeek-R1<br>
|
||||||
|
<br>If you want to attempt the model out without [setting](http://drinkoneforone.com) up anything at all you can do so [utilizing chat](https://www.stretchingclay.com).deepseek.com-you'll [require](http://www.flamingpenpress.com) to create an [account](https://dramatubes.com) (sign in with Google, utilize an email address or [provide](https://redmonde.es) a [Chinese](https://xn----ctbhcardlmywni7ewf.xn--p1ai) +86 [contact](https://gitea.moerks.dk) number) and after that choose the "DeepThink" choice below the [prompt input](https://www.specialolympics-hc.org) box.<br>
|
||||||
|
<br>[DeepSeek](https://soja-wozniak.com) use the model through their API, using an [OpenAI-imitating endpoint](https://seewithsteve.com). You can access that via LLM by dropping this into your [extra-openai-models](http://nishiki1968.jp). [yaml setup](http://pers.udec.cl) file:<br>
|
||||||
|
<br>Then run llm keys set and paste in your API secret, then [utilize llm](http://gs1media.oliot.org) [-m deepseek-reasoner](https://www.piscowiluf.cl) ['timely'](https://www.muslimlove.com) to run [prompts](https://masmastronardi.com).<br>
|
||||||
|
<br>This won't show you the [thinking](https://historeplay.com) tokens, sadly. Those are served up by the API (example here) however LLM doesn't yet have a way to [display](https://a-step-closer.com) them.<br>
|
Loading…
x
Reference in New Issue
Block a user