Episode Transcript
[00:00:00] Speaker A: What is up, guys? Welcome back to AI Unchained. I am Guy Swan and this is where we explore the world of open source AI and AI as a tool for sovereignty, not some giant corporate machine that we're all going to be plugged into that's going to suck us of all of our data and value and manipulate us into whatever world they want. So that's what we do here. If you've never heard about this show or haven't checked it out yet, you've come to the right place if you're looking for the best way to use these AI tools for your advantage and not for somebody else's. So we are digging into something I've had a lot of people ask me about and I've talked about this a lot on the show about a tool called Pinocchio and that's P I N O K I O And if you want to go to the website and start checking it out, it's on Pinocchio Computer. I'll have the link and and details in the description so you can literally just scroll right down. And this will be heavy for video. This will lean towards a useful video project.
Even though I am publishing this on the audio feed, I will do my best to explain it such that it's useful in the podcast feed as well.
But you should definitely go to rumble and or YouTube and or I will. I'll also post this on Nostr and I'll just have this in my NOSTR Build account so that you can watch it natively on Nostra as well if you would like.
But YouTube and Rumble, if you're searching for this later and you're listening to this episode, if you want to follow along, that's the place to do it.
Okay, so just like in a video I did recently talking about my most used tools, there are two pieces of this puzzle that Pinocchio is my favorite tool, but it also works in line with wholesale, which is how I am able to utilize this in a really, really broad fashion. How I'm able to use this on all of my machines very, very easily. And Pinocchio Computer though is is by far my favorite AI tool because it lets me they refer to it as an AI browser and I think that's a fantastic way to frame it.
And this is also entirely open source and it is doing nothing but loading up and using a ton of open source AI tools and there are so many out there. If you are not checking this one out, you absolutely must.
Far and away I use it more than anything else. And especially I use it to test AI tools. So tools that become available that are better to use in some sort of a workflow or setup, like what we talk about with devs who can't code, where I'm basically building small apps and building code with AI in order to utilize a bunch of these tools that's a little bit harder to use within Pinocchio.
But Pinocchio is a great way to download and start to utilize, start to test out how these tools work and then even begin to use them via the API that a lot of these have the built right into Pinocchio. And then if it turns out that it works really well in a workflow or it works really well for one of your use cases, then you go out and actually find the individual tool and install that on your computer so that you have it direct.
And then you can use it, you can call it in a command. You can build a tiny app that makes use of this thing in a permanent workflow that is already, that is always operational so that you don't have to constantly open up Pinocchio and, and then boot the app and utilize it in a more GUI fashion. You can utilize it directly with it by just having the one that you know is going to be perfect for your use case right on your machine. So Pinocchio is a great way to explore and test a lot of these tools and then you take the step of okay, now I'm going to actually install this by cloning the GitHub or doing whatever I need to do, getting the app image, whatever it is, in order to get it running on the computer so that I can use utilize this thing directly whenever I'm running my workflow or doing a project on the computer. And we're doing this right now on my Mac, Mac books, and specifically anything that has the unified memory. So all of the M1 2, three models I feel like are a really, really great tool. This runs really, really well on those, especially if you have a lot of ram.
The reason it works really well is just because a lot of these things require sometimes 8 gigabytes of video RAM and not many video cards have them. But the unified memory of the Mac allows you, if you have 16 gigs of RAM on the machine, you can use 8 gigabytes of RAM for the video processing. So it lets you take advantage of a lot more AI stuff that is difficult to do on other consumer grade machines. However, and you'll notice this in the AI list, the list of AI tools in Pinocchio, a lot of them do specifically require Nvidia, they require a gpu.
So the best setup and the one that I use more than anything, I use my MacBook mostly for exploration.
I use my Linux machine which has a Nvidia GPU and will probably, whenever I get my setup downstairs working, we'll probably have a second Nvidia GPU to take advantage of a lot of this stuff. It is very, very good to have a high end GPU to specifically utilize for these tools. If you are using them a lot.
However, don't go out and buy one. Don't just go, I'm gonna go buy a GPU for two reasons. One, there may be a bear market soon and there might be a bit of a flood of available GPUs maybe in the coming year. It's possible it might not happen.
But also specifically because you want to make sure that you have a direct use case that's worth it to you to go spend $1,000 on a GPU for a machine.
But for anybody out there who has a dedicated desktop machine that has a lot of compute, this setup is going to be particularly useful to you. Because wholesale and the pair stack tools that we're going to be exploring both just generally on this show, but also just specifically in this episode, are going to be just that much more valuable if you have a really solid always on basically your own server, quote unquote, except without any of the headache of networking and setting up a web instance or anything like that. You just connect via the key and you're peer to peer to your machine back at home. It is a lovely setup and I recommend it for anyone who does has the have the capital to spend on that.
But it is not necessary. And I don't want to make you think that you have to go blow $3,000 or $5,000 on like a ridiculous custom machine or something with a really big GPU just to use this. You can use it right on your Mac.
Okay, so the install of this thing is really straightforward, especially on Mac.
On Mac and Windows I think it's a little bit, you know, you have to probably use the command no, wait, no, I think I have an app image for Linux. So it's even just a double click for Linux. But if you'll see, we are literally just going to Pinocchio Computer, which the link is right there in the description. So if you just scroll down on whatever your podcast, app or YouTube or rumble, whatever you're watching this on, it's going to take you to the main website and show you you can actually do a lot of exploring just showing like all of the different tools that you can get into. But the download button is literally right there on the front and it's going to take you to the docs with the instructions.
Nothing complicated, nothing totally out of the ordinary. The only thing is there's a little, a little extra step that you have to do with Mac because you have to patch it, which I don't quite know why, but it's not hard.
You download the dmg, we're going to just put this on the desktop just for the fun of it. And you double click this guy and it's going to open up the drive and drag it to your applications and which I've already done, so I'm not going to do that. And then you right click and hit open on the patch command. Now, it may ask you for your password, it may say code from an unidentified developer, whatever, risky, blah, blah, and you just say yes or punch in your password if it requires it, and then go to your applications and double click and open that guy up. Where is Pinocchio? There you are. All right, but we have already done that, so I don't need to do that right now. And I'm actually just going to go ahead and move this to trash, so don't forget it.
And we will eject this as well and minimize U. And I am just going to go straight to my launch pad, Pinocchio, and boom, you'll install and then boot this guy right up. And this is what you'll see. Welcome to the Pinocchio AI browser. So visit the Discover page and this is where you have a huge list of available tools to explore.
Literally so many different things. And the real secret sauce of a lot of this I feel like is using these in conjunction, like using multiple in kind of a stack.
Because a lot of things are built around using AI generated things in a different way. So let's say for the example is you want to have an animated character that is saying something. Well, you can generate an AI character and with flux or stable diffusion or something like that. Then you move it to one of the tools like Hallo or yeah, here's Hallo. And this is another one you'll notice here, right in this bracket at the beginning, Nvidia only. You'll see that on a handful of these and you'll see Mac only as well, like chat with mlx.
But that's just so that you know whichever one of these gets installed, don't install it on the wrong device, because some of these may only be usable on a handful or on a specific setup.
Forge is another one that's specifically for flux, but you'll notice there are some of them that will let you use flux there that are not, in fact ComfyUI. You can use the Flux model without needing Nvidia only. So it all depends. And there may actually be multiple ways in order to utilize certain models. So don't think that just because one particular tool doesn't work on your machine that it means that you can't get access. A lot of these things are using similar models or the same models in a different way.
And I will actually just go ahead and start this off by saying comfyui is far and away. I mean, it says it right here. The most powerful and modular diffusion model, gui, API and backend with a graph and nodes interface.
You can do almost anything with ComfyUI. Like you can even do like image tagging and stuff. Like you can load in. It is very.
It's complex and it takes a lot to get used to the interface and how to connect things together. You got to really know the different pieces of the puzzle and how the workflow is pieced together.
But once you kind of get the hang of it, there is almost. And if you would go to a place and like, you can download custom workflows that somebody else has already built.
But this is far and away, like inside of Comfy ui, you can make a workflow that will do so many of these other things, specifically, a lot of these are just like one off, built to do one very simple thing. Whereas ComfyUI, when it comes to generative models like diffusion models, video generation, image generation, control net, upscaling a video or upscaling an image, or transmitting a style from one image to another image, or doing voiceover and having it animate, like you name it, ComfyUI has tons and tons and tons of this available. ComfyUI is damn near an AI browser and workflow builder within this Pinocchio AI browser and kind of workflow builder. So this one is very, very powerful. It's a really, really valuable one to learn in the long term, I think, and, and I suspect it will stay a big part of the AI tool ecosystem.
So let's start installing and playing with one of these. Because Comfy UI is so complicated, it's something that I encourage you to dig into, but it's not going to be something that we can easily cover. A lot of these are very single, you know, they've built in gradio a simple UI that just makes it so that you can utilize one of these tools and start playing around with it.
This one is one I have not done. I'm a huge user of Whisper. I do not use it inside of Pinocchio, but I have run the Python version of this, realized that I was going to be able to use this in my workflow, and then I have installed the C version of this on the Mac, which runs like 10 times faster. Literally. That's not even a joke, like 10 times faster than the Python version with fewer resources.
And now if you'll see down here, I have this little app called Scribe Drop. This is something that I just built with an LLM. I think I used chat GPT, the first GPT4 version, maybe even GPT3. I don't know, it's been a while, but this literally just allows me to drag and drop something. Actually, I'll just show you. And I do have other videos with this, but this is the first version or the first opening scene of this episode that I just exported for this, because we're going to be using it, but I just dragged and dropped it onto my Scribe Drop. I have a video specifically about this. And here is the transcription. And this is using Whisper. It's a really, really phenomenal tool.
But I don't need any of this right now, so I'm going to delete it.
If we go back to Pinocchio.
I have never used a MLX video transcription. What I have is a little script that exports the audio with FFmpeg and then feeds it into Whisper and then saves it as a text file and a subtitle file right there in the exact same folder with the exact same name.
So in the spirit of exploring new AI tools with Pinocchio, that's what we're gonna do.
And just so you know, I really wish they had these clickable because if you click on the Cocktail Peanut of Verified Script Publisher, by the way, Cocktail Peanut is the person who has built pretty much all of this. And it's amazing.
And also another really fantastic tool alongside of this, especially if you generate a ton of AI images, even is Breadboard, which is also built by Cocktail Peanut.
And I just don't, I don't even understand, like the amount that this, this guy is building is just unbelievable. But there's always a link here to the GitHub so that if you want to explore it, I'll just go here and we will paste in just so you can see.
So Apple MLX powered video transcription.
So if you want to learn more about the app itself, rather than the package for Pinocchio, which is this link right here, GitHub, Pinocchio factory video transcription. This might even be useful in some way, but most of the time it doesn't have much for details. So if you're trying to learn about the tool, more about the tool, this is going to be the place you go. It's the original GitHub link. This is in the description. But right now we just want to download this MLX video transcription git. Sure.
So what these are, this is a packaging system for how to install AI tools basically in looks like Docker containers to me onto the computer. So that you have your own little packages for all of these things. And it just works like these are all basically, you know, you go to an app and it says run this command and run this command and run this command, and these are the prerequisites that you need. Well, this is just packaging it so that all of this runs automatically. So Pinocchio is just going to the GitHub page and be like, oh, well, these are all the things that you need to run and these are all the things that you need to download to make it work. Well, we're just going to basically write those instructions up and then stick it into this thing and then you can click a button and all of this happens. So you'll see all this fancy, seemingly complicated terminal log output.
You don't have to worry about it. You just. Just wait, just wait. You'll see, you'll just watch it. Here's two out of three. Shell.run it's just running a bunch of terminal commands and you just wait, just give it a couple of minutes and then boom, you're gonna have an app and you'll see there will be a start button right here.
So boom, boom, boom.
Open web ui.
I didn't do anything. I just clicked and now this is booted. And here's something that you're going to be very, very. It's going to be very, very useful is notice that this is local host colon8501. This is the secret information that you need for wholesale to deliver this to any other device, whether you're using the wholesale Go app or you have another desktop machine or you have a friend that you want to show this to so that they can use it on their own computer.
So, yeah, let's see here.
Drag and drop file. So this looks like here. So we're just distill large V3.
I like to use the largest ones because typically they are the biggest, best. They usually have the best results. So I'm just going to drag our first scene from this one and there it is.
All right, prints red.
Let's see what happens.
Transcribing.
Transcribing the video.
Oh, this is distil whisper large V3. So this is the large Whisper model. So I'm using basically the exact same model model that I was before.
Now it does seem like this is taking a little bit longer than it did running it natively on my machine. But here's a great thing. You'll notice that when I did this, and this is just the Whisper. So this is the exact same model that I run in my little transcribe app.
So what you'll notice is, you know how many services charge you for the premium and they say, oh, well, we'll give you free transcription for this.
You can do this yourself. You can do this by waiting.
What was that, 10 seconds? I don't know. In order to produce your own. And you can do all of this for free with open source code. The amount of this stuff that is available very, very quickly inside of one of these tools, I swear, the Pinocchio and everybody who is building on this ecosystem, the speed at which something comes out and then they have something running that you can use has been increasing and I think it's because they now have. They literally have a tool inside of Pinocchio for packaging things for Pinocchio. So it's really, really interesting the way that these things are stacking on top of each other. And with AI able to simplify and shortcut a lot of the code in getting a lot of these things to work, I just really think this is going to speed up a lot and I think this is going to be a hugely valuable and a multiplier, a massive multiplier for the open source AI ecosystem. So that was.
[00:20:58] Speaker B: That was it.
[00:20:58] Speaker A: I just installed a tool in Pinocchio, Transcription tool in Pinocchio, tested it, used it here. I got a transcript. I can click download and it's zip. So zip file. Let's go to desktop.
I'm curious what's in the zip file? I wonder if it's got like multiple things here.
Transcription. Yeah. Okay, so the subtitle file, the text file and the VTT file. So this will.
These will both allow you to play it with a video or, you know, package it in like an MKV file and you can just watch it and it will have the subtitle show up on the video.
And that was that really easy. And this is literally how you can use so many of these tools and you just stop it to kill this so that you don't have to have it running all the time and using up your ram, you just hit stop on the terminal, then go back and go back home, which you'll notice it opens up multiple multiple windows just so that you can work really well and you're not constantly having to go back or stuck inside of a window. So when you're installing something, it will have its own window to actually do.
And here's the other kicker is that not only can you have share apps which will show you how to again, the localhost 8501 in order to share this. Like, yes, you can run this through wholesale and have it accessed remotely, but you can actually share this locally inside of your own home network without doing that.
And I will show you yet again, again we're going to stop this and go back home.
But this lets you local share Pinocchio. So you can actually have Pinocchio running on your Linux machine or your desktop and then access it with your phone or with your other your laptop or your iPad and log in and actually have this interface and play with these tools locally in your own home and, and boot up your MLX video transcription and you can drag and drop your video from.
You know, let's say you recorded yourself giving a lecture today. You gave a lesson for someone on a, I don't know, you were teaching them about Bitcoin and you can just drag and drop that into it and you can have that transcription right on your device. And importantly, you did not have to use that device's computer. You were using the one that's plugged in or your desktop machine. And I really just think this in combination with the Pear Stack, like wholesale, just makes having multiple devices so much easier to utilize, so much more valuable to have many different things in place or many different tools that are being used in tandem or using the best tool for the best job, no matter which device you're working from. And this is such a, this is such a cool thing. And it's funny that somehow this was only available after AI became a thing like that. It's hard to, and I don't mean it from the Pear stack side of things, like the Pear stack has just been in progress for a really long time. I mean, just specifically this the norm of having it so that you can run it as a service and make it available on other devices because I can do this without wholesale. It's just that if I leave my house, wholesale is the thing that lets me connect through it. And again I'll have the link. I don't want to go through installing wholesale again, but I'll have the link in the show notes to installing that specifically, because I do have a video about that.
[00:24:51] Speaker B: So that's basically the gist of it.
[00:24:54] Speaker A: And this gives you more than enough.
[00:24:57] Speaker B: Foundation to just start exploring. And there is tons of different stuff to explore. But I want to show you one more thing before we end this video just so you can kind of see the many different things that you can do with it. And as I mentioned, the API and how to use these tools in a workflow, but we will get to that.
[00:25:19] Speaker A: In a later video.
[00:25:20] Speaker B: It's a little bit more in depth for tackling it here as well. Like I said, I want to show you one more thing.
So I'm jumping over to my Linux machine here.
[00:25:34] Speaker A: And I want you to.
[00:25:34] Speaker B: See a couple things. So first you can see I have a lot more models and tools installed on my Linux machine. And that is because, and this is.
[00:25:46] Speaker A: One thing to keep in mind too.
[00:25:48] Speaker B: That I didn't mention is a lot of these. Almost all of these come with their own models that you have to download. So some of these take a little bit to install.
You're probably going to need a lot of hard drive space in order to house and use a lot of these tools regularly. That is why I do this on my Linux machine and I basically delete it all from my Mac because I'm pretty limited on my space for my MacBook.
However, I wanted to point out two things because a lot of people ask what's the use case of AI? I don't want to generate a bunch of random images.
What can I do with this?
This is something and I haven't sorted out all the pieces of it. And you should definitely subscribe if this interests you at all, because this is something I am going to complete a build and a workflow for.
And I also just want you to kind of start getting a hint of just what you can do with a lot of these things, especially when you start to use the API.
So this is Open Web ui, which is an interface that you can actually use to run, or it's basically a chat GPT like interface that you can use for any model. Now I actually don't have my username and password on this computer, which is funny because I never use it on this machine. I use it on my Mac running off of this machine. That is something I will show you when we do the wholesale and when we do the API episode.
But this is really useful because this is what's going to allow you to use an API in order to take some piece of data or take a transcript of something and have it summarized or pull some useful information out of it. Utilize this stuff programmatically where you can build a tiny app, maybe even like Even with the LLMs on the computer, you can use it to build an app that's going to utilize the LLM and to solve some sort of a problem or streamline some part of your process.
Now, Florence 2, this is a tool from Microsoft which is about image captioning. And I have a number of different models here for, you know, large base. I'm, I'm honest, honestly not sure the difference between the base and the large.
But I've been using the Florence 2 large and the task prompt. The different things that you are that we're trying to do with this, I want you to see. So I'm going to add to this. These are just examples that you can put or that they put at the bottom so that you can utilize this.
[00:28:26] Speaker A: And go ahead and start trying it.
[00:28:28] Speaker B: Now we have object detection, dense region caption, all of this stuff.
[00:28:33] Speaker A: I just want to do a more.
[00:28:35] Speaker B: Detailed caption so that. And you can add in a text prompt to get more specificity or to pull something very explicit out of what you were trying to get from the image. But I'm just going to submit and get a more detailed caption.
And now look at this.
So this is the text output. The image shows a vintage Volkswagen Beetle parked car parked on a cobblestone street in front of a yellow building with two wooden doors. The car is painted in a bright turquoise color, has a white stripe running along the side. That's not. That's actually a reflection. It has two doors on either side of the car, one on top of the other and a small window on the front. The building appears to be old and dilapidated with peeling paint and crumbling walls. The sky is blue. There are trees in the background. Now I want you to think about.
You can have this automatically run and then save these details into the comments or the metadata of any image and then you can go back in search later for turquoise Volkswagen Beetles and this picture will always come up.
Now using this in conjunction with something like the Open Web UI and plugging.
[00:29:59] Speaker A: It into an Ollama model.
[00:30:00] Speaker B: So I'm going to copy this.
I have Ollama running in my terminal.
And this is also something that I'll probably cover in a video. This is a really great simple LLM interface. And I have downloaded and I'm running llama 3 and I just want to.
Can you pull important tags from this image description?
So I would have to build my prompt a little bit better, but you can see how I took the generic description and I pulled the important pieces, the important objects, the important adjectives out of this.
And I could then easily sort all of my images. I could do. I could run this for every image on my computer and I could sort every one of them for this on a street scene, that it's an old or dilapidated, dilapidated image of buildings or houses or cars.
And you start to see that you can have very contextual awareness and a much deeper understanding of all of the data and all of the images and.
[00:31:27] Speaker A: Things on your computer.
[00:31:28] Speaker B: And this is really just the beginning. And again, you can use this with API so you can build this into a workflow so that you can put this so that like every time you save something to a particular folder, it runs this.
And there are other things that you can do with this as well, like object detection.
Now, it's not necessarily easy to see how you might be able to use this, but these are pixel locations and then labels as to what these objects are in the image. And you can actually use this in some sort of a tool like FFmpeg to literally cut these things out of the image in an automatic way.
So if you wanted to literally cut Volkswagen Beetles or a car out of every single image that had a car and do something with it, like make a collage automatically, you can extend this out to working with video. One of the tools that I've been working with is there's a. There's a piece scene tool, I believe this isn't in Pinocchio, but I am using it in conjunction with a different Pinocchio tool that allows you to catch scene markers in a video.
And then I tell it to grab a bunch of screenshots from them and then run Florence2 detailed caption.
And then you can get time codes attached to those as well. And you can have this contextual search for exact places inside of videos, looking for video clips for exact things.
And this is just kind of the beginning of how you can start to stack all of these tools. I've just started to scratch the surface and play with very rough draft, kind of hacked together options for these.
But it's still. It's still just so clear how powerful and how many different things could be done with it? An OCR with region. So OCR is Optical Character Recognition.
This is the other image example. It says CUDA for engineers. It looks like the the front of a PDF or something for an introduction to High Performance Parallel Computing. And if you do OCR with region and hit submit and it shows you all the text, it reads it out. CUDA for Engineers. Introduction to High Performance Parallel Computing. Et cetera, et cetera. If you use this in conjunction with something like your LLM like ollama, what you can then do is identify papers. If you ever save a PDF and sometimes it's just got like a random title, you can have it grab the title and automatically rename it for you.
There's just so many different little productivity tips and little like save me time. Especially with the number of stuff that I save. I'm a bit of a data hoarder. And when I have so many different memes and images and stuff, little video clips grabbed from Twitter and YouTube and God knows where else, anything that I feel like I want to have at a later date, either for found footage or just to be able to read through, I save it. I save so much stuff, but so often I don't have the time to go through and organize it. And I'm beginning to see how I can just use three or four really powerful open source tools and I may be able to get the computer to do all of the organization for me. And in fact in some ways do it better than I could have done by myself anyway, let alone do it while I go into the other room and go to sleep and have it run and sort and name and tag and write detailed descriptions for thousands of images, videos, write out transcriptions, summarize it, stick it back in the info, the comments of the video and stuff itself and be able to search exactly, in every single video for exactly what words were said at exactly what time. And this is all possible with a free app, a free AI app that the Cocktail Peanut and then a ton of other people built together with all of these different AI tools that is usable and you can access via API and that you can just build little things to use.
And I can run all of this on my Linux machine and I can access it and do these same things from my Mac.
I hope that at least gets you to begin to see that there's some really powerful use cases here, especially if you spend a lot of time working on your computer and wishing you had time to organize a bunch of stuff.
And I hope to be able to subscribe to the show, subscribe to the YouTube and everything. I intend to share with you as I build these things out and get them working in a way that I think is really useful.
And I hope that we are building some tools in another area of projects and things that I'm working on that will kind of really show the power and how easy a lot of this stuff can be and how valuable it can be. I guess I'll just say stay tuned.
This is why I started AI Unchained. This is why I started the Pear Report, and I think there's just some really, really cool and exciting things that we can do with this, and I hope you guys stick around to find out. So with that, I guess we will close this episode. Thank you guys so much for listening. Don't forget to get yourself a cold card and put your bitcoin on it and keep it safe while you're exploring all of this cool stuff. And I will catch you on the next episode of AI Unchained. And until then, everybody take it easy. Guys.
[00:37:53] Speaker A: Sa.