See all Matt Wolfe transcripts on Youtube

youtube thumbnail

ChatGPT vs. Bard vs. Claude 2 (Which is best?)

34 minutes 37 seconds

Speaker 1

00:00:00 - 00:00:26

So recently, ChatGPT has gotten some competition, specifically with Anthropx Cloud 2 coming out and also with Bard getting a bunch of upgrades. So I figured now would be a good time to sort of pit them against each other. Now, before I get too deep into this, this is by no means scientific whatsoever. Obviously, there's an infinite combination of potential prompts that you can try and different large language models are going to perform differently based on all sorts of criteria. And this is also by no means exhaust.

Speaker 1

00:00:26 - 00:00:54

There are obviously so many other criteria that I could test it on that I'm probably just not thinking of. And as more large language models get released and as these models get improved, I'll probably remake a video like this in the future to better compare them as they improve over time. Now for this video, I'm going to be comparing them on cost token limit, the ability to browse the web, summarizing long content, image recognition, data analysis, creativity, the ability to code. Now I'm not a

Speaker 2

00:00:54 - 00:00:56

coder myself, but I will be able to test it

Speaker 1

00:00:56 - 00:01:19

on some simple HTML, CSS code. The accuracy, which again, is a very, very difficult test to do because different models are going to be more or less accurate in different areas that I could potentially test them in. So this is going to be a very simple analysis of accuracy. And then finally, these large language models don't help you if they're not available to you. So we're also going to compare them on availability.

Speaker 1

00:01:20 - 00:01:39

So let's start with what we know right off the bat. We know the cost of all of them. ChatGPT, if you're using the 3.5 model, you can access it for free. You don't get access to code interpreter. You don't get access to all the plugins, and when the servers are bogged down you're less likely to get in if you're on the free version, but it is available for free.

Speaker 1

00:01:39 - 00:02:01

Then you've got ChatGPT 4. If you want to use GPT 4 it costs 20 bucks a month, but you also have access to all the plugins, you also have access to code interpreter, and when web browsing is available you have access to web browsing. And then you have Cloud 2. Cloud 2 is also currently available for free for anybody to use. And then there's BARD which is also currently available for free for anybody to test and use.

Speaker 1

00:02:01 - 00:02:43

Next we have the token limit. If you're using the free version of chat GPT or 3.5, you have access to 4, 096 tokens. Now this is not an exact science, but as a rule of thumb, the amount of words that you can combine to input and receive as an output from the large language model is roughly 75% of the token limit. So with GPT 3.5 with a token limit of 4, 096, you roughly will get an input slash output of about 3000 words. If you're using GPT for the model we have available right now instead of chat GPT gives us access to 8, 192 tokens, which is roughly about 6, 000 words combined input and output text and CLOD-2.

Speaker 1

00:02:43 - 00:03:11

Now this 1 really, really shines in that it gives us access to a hundred thousand tokens or roughly 75, 000 words between input and output which is amazing if you want to copy and paste an entire book in there and have it summarize something like that for you and then there's BARD which the token limit has not been made publicly available. So we don't actually know how many tokens we have access to inside of BART. Those are some of the givens. That's what we know right now. Some of these other ones are going

Speaker 2

00:03:11 - 00:03:21

to be fairly subjective as we start to test them. So let's start to test web browsing first. GPT 3.5 does not currently have access to web browsing.

Speaker 1

00:03:21 - 00:03:34

GPT 4 had access to web browsing for a little while, but they took it away from us because people were able to summarize websites that were behind paywalls. And they're trying to fix that before they give us access to web browsing again.

Speaker 2

00:03:34 - 00:03:37

However, GPT-4 does have access to

Speaker 1

00:03:37 - 00:03:47

a bunch of various plugins and some of these plugins do have access to the internet. For example, there's this web requests plugin that we can enable here and we can type something like

Speaker 2

00:03:47 - 00:03:57

what is the site future tools.io about it starts to use the web requests. And there we have it. The website future tools.io is a platform that collects and organizes AI tools. The goal

Speaker 1

00:03:57 - 00:04:08

of the site is to help users find the exact AI tool that they need. And then it lists a bunch of tags that are available. The site also features a section called Matt's picks, and you can see it gives us a screenshot of what

Speaker 2

00:04:08 - 00:04:09

the future tools website looks like.

Speaker 1

00:04:09 - 00:04:22

We do still have access to web browsing. It's just not the built in browsing that we had before. It seemingly does kind of the same thing. Let's go ahead and use this same prompt here. Let's jump over to Claude and let's try the same thing.

Speaker 1

00:04:22 - 00:04:33

Let's paste this in. What is the site futuretools.io about? And it says unfortunately I do not have enough context to provide a detailed summary of the website futuretools.io as I do not have access to view the full content of

Speaker 2

00:04:33 - 00:04:34

the site. Let's just go

Speaker 1

00:04:34 - 00:04:36

ahead and say, can you browse

Speaker 2

00:04:36 - 00:04:41

the web? No, I don't have the capability to autonomously browse the web or access websites directly.

Speaker 1

00:04:41 - 00:04:44

All right. So let's jump over to Bard and paste in the same prompt here.

Speaker 2

00:04:44 - 00:04:50

What is the site future tools.io about future tools.io is a website that lists and reviews AI tools.

Speaker 1

00:04:50 - 00:05:08

It was founded by Matt Wolf, who is a writer and entrepreneur. The website's goal is to help people find the best AI tools for their needs. Then it lists awesome tags and Bard we do know has access to search Google. So yes, it can actually browse the web. So jumping back to our comparison chart here, GPT 3.5, no web browsing.

Speaker 1

00:05:08 - 00:05:20

GPT 4, yes, with plugins. Cloud 2, not available. And BARD, yes, it just has web browsing directly built in. All right, So let's talk about summarizing long content. We'll start with GPT 3.5.

Speaker 1

00:05:20 - 00:05:47

There's this great article on not boring.co by Paki McCormick here called the great online game. It's an awesome article, but it is quite a hefty long read here. So let's go ahead and copy the entire text of this article here. Let's paste it into GPT 3.5 here, and then ask it how many words is this article? So the word count on this article is 2, 661 words.

Speaker 1

00:05:47 - 00:06:07

It should fall within our token limit of at least these 3. Not sure about Bard yet. So let's go ahead and type, summarize this article into bullet points. The author discusses the concept of the great online game and how it's played by billions of people. Social media platforms are seen as a manifestation of this game, but extends beyond them to various online and offline spaces.

Speaker 2

00:06:07 - 00:06:08

The article highlights individuals who

Speaker 1

00:06:08 - 00:06:20

have excelled like Elon Musk, Lil Nas X, and the creators in the crypto space. Crypto is described as a supercharger for the game. The author explains how to play the game, emphasizing experimentation, engagement, and building connections,

Speaker 2

00:06:20 - 00:06:23

et cetera. So not a bad summary. So let's go ahead and

Speaker 1

00:06:23 - 00:06:47

do a new chat. Let's jump over to GPT-4. We'll paste the same prompt in here and let's add summarize the above article into bullet points. Interestingly, when I paste this into GPT-4, although it should technically have a higher token limit than GPT-3.5, GPT-4 is telling me the message I submitted was too long. Please reload the conversation and submit something shorter.

Speaker 1

00:06:47 - 00:06:58

Let's regenerate response. Yep. I get the same thing. When I used GPT 3.5 turbo, it allowed me to paste that in and get a response. When I use GPT 4, it says it's too long.

Speaker 1

00:06:58 - 00:07:33

So I actually believe what this means here is that we're actually not using this 4096 token limit with chat GPT 3.5. We're probably using the 16, 000 word token limit, which would give us roughly 9, 000 words. Again, I'm not a hundred percent sure. I'm just noticing that when I use the GPT 4 model It says it's too long to work with when I use the GPT 3.5 model. It actually summarizes the text properly interestingly at 2661 words It should work in both of them.

Speaker 1

00:07:33 - 00:07:52

So let's go ahead and confirm that real quick. Let's go ahead and copy all of this text here. I want to paste the whole thing into a Google doc and then let's do a word count here. So it's actually 4, 350 words, which should still be enough for a token limit of 8, 192, but for some reason, chat GPT 4 won't actually summarize this for me.

Speaker 2

00:07:52 - 00:07:58

Also, this shows that there's a little bit of an accuracy issue here with the word count, if you ask it to count

Speaker 1

00:07:58 - 00:08:21

the words, let's jump over to Claude real quick from Anthropic and let's just paste this whole thing in. And 1 thing you'll notice with Claude is when you paste a large chunk of text in and automatically converts it to a text file instead of just pasting it as 1 giant wall of text. Let's ask Claude. How many words is this article based on a word count. This article contains approximately 3, 149 words.

Speaker 1

00:08:21 - 00:08:30

I don't know why these have an issue counting words, but it seems like Claude has that same issue. Let's go ahead and type summarize this article into bullet points. So this did

Speaker 2

00:08:30 - 00:08:34

a fairly good job of summarizing it. So I'd say that's a plus for Claude there.

Speaker 1

00:08:34 - 00:08:43

Now let's go ahead and try with Bard. Create a new chat here. Let's paste this in. And the first thing I notice is it actually cuts off. It won't let me paste the whole thing in.

Speaker 1

00:08:43 - 00:09:06

I can't type anymore, so I can't even copy and paste the whole thing into BARD. It won't let me. Now there are some things that we can do to get some workarounds for both BARD and GPT-4. We can actually come over here, click on code interpreter. We can paste this entire text into a notepad file and let's go ahead and save this as a text file called the great online game T-Gog.

Speaker 1

00:09:06 - 00:09:28

We'll save it as the great online game like that. I can drag this text file right inside of GPT 4. And let's say, how long is this article and using code interpreter, you can see this article contains approximately 4, 155 words. So using code interpreter, it figured it out. And now I can say, summarize this article into bullet points.

Speaker 2

00:09:28 - 00:09:30

Well, I thought that would work,

Speaker 1

00:09:30 - 00:09:51

but now it's saying, I'm sorry, but I currently can't import the necessary module transformers as the current environment does not have internet access. However, I can generate a manual summary based on my understanding of the texts and then it tried again. I apologize for the confusion. Unfortunately, the environment I'm running on currently does not have genism library installed and does not have internet access to download it. So it won't actually summarize it for me with GPT 4 and code interpreter.

Speaker 1

00:09:51 - 00:09:51

That's kind

Speaker 2

00:09:51 - 00:09:53

of annoying. Let's see if we can find another workaround.

Speaker 1

00:09:53 - 00:10:04

Go to GPT 4. Let's go to plugins this time and let's use the web request plugin. Copy the URL, summarize this article into bullet points, and let's paste the URL in.

Speaker 2

00:10:04 - 00:10:07

All right, here's the summary of the article. So we finally got it to work.

Speaker 1

00:10:07 - 00:10:25

We had to use some workarounds. We couldn't just copy and paste it in, which I'm a little confused by, because it seems like the context window should have been large enough, but we did eventually get there. Now, Bard, I should be able to use this exact same prompt here. Summarize this article into bullet points. Go ahead and delete what I pasted in earlier.

Speaker 1

00:10:25 - 00:10:27

Paste this in. Well, I thought Bard would be able

Speaker 2

00:10:27 - 00:10:33

to do it, but it says I can't assist you with that as I'm only a language model and don't have the capacity to understand and respond.

Speaker 1

00:10:33 - 00:10:50

So jumping back to our comparison here, summarizing long content, chat GPT 3.5 seem to do it well. I think it's using the 16, 000 token limit now. Now I don't know for sure if that's available on the free plan or if only plus users get 16, 000 token window. I'm not sure. Chat GPT using GPT 4.

Speaker 1

00:10:50 - 00:10:57

Yes, it worked, but I had to use a workaround where I used to plug in to do it. It didn't work well with code interpreter and it didn't work well right out of the box because

Speaker 2

00:10:57 - 00:11:00

it said that it was too long. Plod 2 worked great

Speaker 1

00:11:00 - 00:11:09

and Bard. Well, that was a fail on Bard. All right, let's try image recognition now. So I already know GPT 3.5 isn't going to be able to do it. There's no upload button.

Speaker 1

00:11:09 - 00:11:28

I can't make it work with this. Now GPT 4, let's first try with code interpreter. Let's go ahead and use this image which I generated for a previous video that I did. We'll drag and drop it straight into the message here and let's ask it, can you describe this image? So here's the image on the screen again.

Speaker 1

00:11:28 - 00:11:46

The image appears to be a drawing or illustration, not a photograph. The main subject is a character who seems to be a prisoner wearing an orange jumpsuit. He has a distinctive large head with exaggerated features including a broad nose, big round eyes and a wide mouth. Ok so that's not quite accurate. He is bald and his ears are large and stick out.

Speaker 1

00:11:46 - 00:11:55

He is holding a placard with numbers on it suggesting a mug shot. No. The numbers are not clearly readable. Prisoner has a somewhat sad and resigned expression on his face. The image uses a cartoonish art style.

Speaker 1

00:11:55 - 00:12:12

So not quite great. Though it did get a few things right. It noticed that it was a prisoner wearing an orange jumpsuit, but other than that it just made stuff up. All right, so let's jump over to Claude and Claude does not actually give me the ability to add an image. So Claude does not seem to have the image recognition ability.

Speaker 1

00:12:12 - 00:12:23

And finally, let's jump over to Bard. Let's create a new chat. We'll upload the image and I'll say please Describe this image and it says sorry. I can't help with images of people yet.

Speaker 2

00:12:23 - 00:12:28

Okay. Here's an image of a wild west town Let's see what it does with this. All right. So here's

Speaker 1

00:12:28 - 00:12:46

the image The image shows a small town in the wild west the town is made of wood and has lots of buildings. Okay the buildings are all different shapes and sizes and they are all made of wood. Yeah we got that part already. The town is surrounded by mountains yes and there are trees in the background not really. A horse-drawn carriage is parked in front of 1 of the buildings.

Speaker 1

00:12:46 - 00:12:56

Okay. Yeah. Maybe. Carriage is black with red wheels and has a white canopy. There is a man standing in the carriage and he is wearing a cowboy hat and long coat.

Speaker 1

00:12:56 - 00:13:12

The text of the image says HDCS, which should could stand for horse drawn carriage saloon. Okay, there is a DC and an S up here. Yeah, so I guess it found that. The buildings in the town are made of wood. The mountains in the background are tall and jagged.

Speaker 1

00:13:12 - 00:13:16

The trees in the background are green and lush and they provide a contrast to the dry, dusty landscape.

Speaker 2

00:13:16 - 00:13:18

All right, so I mean, it hallucinated half of it.

Speaker 1

00:13:18 - 00:13:36

Let's go ahead and see what GPT4 says about this same image. Describe this image. The image seems to be a digital illustration depicting a wild west town with a whimsical twist. The buildings have a characteristic western architecture with wooden facades, balconies and saloon like appearance. Okay so far so good.

Speaker 1

00:13:36 - 00:13:45

Interestingly the inhabitants of this town are not humans but robots. These robots have a humanoid form and are dressed in typical Western style attire including cowboy hats, bandanas and boots.

Speaker 2

00:13:45 - 00:13:52

1 of the robots standing in the foreground has a sheriff star pinned to its chest indicating it might be the law enforcement in this town So there you go.

Speaker 1

00:13:52 - 00:14:10

None of them are very great. They get the idea, right? It's a digital illustration depicting a wild west town, but then it just kind of goes off the rails with the description So updating our chart here chat GPT 3.5 not available GPT 4. Yes, but not the best accuracy Clawed 2 not available bard. Yes, but

Speaker 2

00:14:10 - 00:14:17

it can't recognize humans and it's also fairly inaccurate and just make stuff up. All right, so let's test data analysis. Now I'm just going

Speaker 1

00:14:17 - 00:14:18

to use a real simple spreadsheet. I use this

Speaker 2

00:14:18 - 00:14:20

in 1 of my previous videos where

Speaker 1

00:14:20 - 00:14:39

I listed a handful of dates, handful of food I ate, and then how I felt afterwards, only giving it the criteria of headache or felt fine. This is all made up data. This is not based on anything real. I just wanted to see how it would interpret this to see if it found a correlation. You can see anytime it has the word fried in it, I had a headache and going through that seems to be consistent.

Speaker 1

00:14:39 - 00:14:44

And so I was just hoping that it would simply find the correlation that fried foods equal headache. So if we

Speaker 2

00:14:44 - 00:14:46

create a new chat here, GPT

Speaker 1

00:14:46 - 00:14:56

3.5, I can't actually upload the spreadsheet, but let's go ahead and see if I can copy and paste the CSV in there. So if I export this as a CSV file here,

Speaker 2

00:14:56 - 00:15:00

you can see here's a file of the comma separated values. Let's just go

Speaker 1

00:15:00 - 00:15:04

ahead and copy this, paste it into GPT 3.5. And then I'm going

Speaker 2

00:15:04 - 00:15:16

to type the above list is comma separated values with dates, food I ate and how I felt after eating them.

Speaker 1

00:15:16 - 00:15:54

Can you find correlations between the foods I ate and how I felt after eating them? Alright so to find correlations between the foods you ate and how you felt afterwards we can analyze the data blah blah blah foods associated with headache there they all are foods associated with fine There they all are based on the data. It appears that fried foods such as fried chicken, fried cheese, et cetera, are consistently associated with headaches with this basic data analysis. If I copy and paste the CSV in, it can do it. Now what's nice about using GPT-4 is I can click on code interpreter and just pull in the CSV file in the same way.

Speaker 1

00:15:54 - 00:16:23

Please find a correlation between foods I ate and how I felt after eating them. So it's actually repeating the data back to me and it separated it into a different style with sort of different columns from my original 1. Now, interestingly, it didn't actually necessarily find the correlation between it being fried foods. It said you consistently reported headaches after eating fried calamari, fried cheese, fried chicken, fried macaroni and fried pickles. But it didn't say anything about like, you seem to get headaches when eating fried foods.

Speaker 1

00:16:23 - 00:16:27

So I might have to prompt it a little bit further. Although when I've done this experiment in

Speaker 2

00:16:27 - 00:16:28

the past, it did find the correlation.

Speaker 1

00:16:29 - 00:16:43

What is the consistent factor for getting a headache? All right. So from the data, it appears that all the foods associated with a headache are fried. So it finally found the correlation with a little bit of extra prompting. But again, this is very, very basic data using code interpreter.

Speaker 1

00:16:43 - 00:16:49

You can actually really analyze a lot of data. And this 1 is just a little too simple to show

Speaker 2

00:16:49 - 00:16:50

off what it's really capable of

Speaker 1

00:16:50 - 00:17:08

but let's go Ahead and pull it into Anthropic and give it the same prompt it quickly found the correlation based on the data you provided It seems like you tend to get headaches after eating fried foods So this quickly and easily found the correlation when uploading the CSV file, something that when I just now did it with Code Interpreter, I actually had to give it some extra prompts.

Speaker 2

00:17:08 - 00:17:10

All right, let's try BARD. Now, when

Speaker 1

00:17:10 - 00:17:35

I go to upload a file in BARD, it only accepts JPEGs, PNGs, and WebP files, so only image files. So I have to do what I did with GPT-3 and literally copy and paste the CSV file. So I'm going to go ahead and copy this same prompt, paste it into BARD and see how BARD does. Interestingly, BARD took a different approach. The correlation coefficient between the 2 variables is negative 0.23, which means there's a slight negative relationship between them.

Speaker 1

00:17:35 - 00:17:37

This means that if you eat certain foods you

Speaker 2

00:17:37 - 00:17:39

are more likely to feel a headache afterwards. The foods that seem to

Speaker 1

00:17:39 - 00:18:10

be most associated are fried chicken, fried cheese, fried pickles. These foods are all high in fat and calories and can also be high in sodium. These factors all contribute to headaches. So it didn't actually find the correlation between being fried foods, but it found a correlation between the foods all being high in fat and calories. So instead of just analyzing the data from the columns, it actually analyzed the types of foods that are high in fat and calories and associated that with headaches instead of just the word fried which is kind of cool I guess.

Speaker 1

00:18:10 - 00:18:40

So again this was a pretty weak demonstration of the data analysis. I have ideas to go much more in depth on this in future videos but because I'm trying to break down so many criteria I didn't want this video to be 10 hours long so this is just scratching the surface. Code Interpreter and Clod 2 are actually much more capable than what I'm showing off in this video but just for a quick recap all 4 of them are capable of data analysis. With GPT 3.5, you do have to copy and paste the CSV file. With BARD, you also have to copy and paste it.

Speaker 1

00:18:40 - 00:19:02

And it also analyzed it from a different approach. Got GPT 4 and CLOD 2, you can literally just upload the CSV file and it will interpret from the file without having to copy and paste the text from within the file. Now, creativity is a hard 1 to test because obviously it's subjective and I'm fairly sure all of these are gonna be fairly creative. So let's test it in 2 ways. Let's get its ability to rhyme and

Speaker 2

00:19:02 - 00:19:13

write a poem and let's see its ability to tell jokes. And that'll sort of be our rough analysis for creativity for these large language models. Let's start with GPT 3.5. So

Speaker 1

00:19:13 - 00:19:21

write a poem that rhymes, that ties Artificial intelligence together with eating pizza. So let's see if it could make a poem that includes artificial intelligence, eating pizza,

Speaker 2

00:19:21 - 00:19:24

and also rhymes. So that wrote it extremely fast.

Speaker 1

00:19:24 - 00:19:41

That was like boom instantaneous in a realm where circuits twine, artificial minds align, Silicon dreams and coded grace meet with taste in a pizza place. Behold a slice of heaven's delight where crust meets cheese a savory sight. AI's algorithms calculate perfect toppings and edible fate.

Speaker 2

00:19:41 - 00:19:42

Alright so not so bad I'm not

Speaker 1

00:19:42 - 00:20:04

going to read the whole thing. Let's go ahead and do GPT-4 here. In a world where silicon meets mozzarella's touch, AI dreams and pizza fantasies. Who knew we'd love so much. Deep in circuits, pizza cravings might seem odd but when it comes to sustenance even AI gives the nod binary codes and pepperoni Strange though they appear when united in verse their parallels become clear.

Speaker 1

00:20:04 - 00:20:39

Thin crust of logic thick crust of dough both must be precise in the oven of creation we bake both AI and slice. So this 1 I actually I think is much more interesting it seems a little more deep like is actually finding sort of things that are similar between making pizza and creating AI algorithms so that's pretty fascinating let's paste the same 1 into Claude here. Oh Claude you're so intelligent your knowledge is heaven-sent I asked you to do 1 thing make a poem that would sing about artificial intelligence and how it goes with pizza. Hence, you processed my request with care and crafted a rhyme beyond compare. Your neural networks, they are smart.

Speaker 1

00:20:39 - 00:20:47

You took my prompt and made art. Combining pizza and AI in a clever, poetic way. So thank you, Claude, for what you've done. This rhyming poem sure was fun. Who knew

Speaker 2

00:20:47 - 00:20:57

that pizza and AI could come together? My oh my. Yeah, I mean it's okay. Not nearly as deep as what GBD4 sort of put into theirs. Alright, let's try with Bard now.

Speaker 1

00:20:57 - 00:21:10

Oh the wonders of AI. It can do so many things. It can even write a poem about eating pizza and wings. The dough is made with care, the sauce is made with love, the cheese is melted just right, the toppings are never above. The AI knows just what to do to make the perfect pizza pie.

Speaker 1

00:21:10 - 00:21:18

It takes all the guesswork out and leaves you feeling so high. The next time you're feeling peckish and you don't know what to eat, just call upon the AI and it will make your pizza dreams complete.

Speaker 2

00:21:18 - 00:21:23

Not bad it's basically saying AI can help you create a pizza I guess. Now let's see how they tell jokes.

Speaker 1

00:21:23 - 00:21:35

Give me 3 of your most hilarious jokes. Here are 3 jokes for you. Why don't scientists trust atoms because they make up everything? What did 1 wall say to the other wall? I'll meet you at the corner.

Speaker 2

00:21:35 - 00:21:44

Why did the bicycle fall over? Because it was too tired. So I've actually asked it for jokes before and it's actually given me some of these same exact jokes. So as far as creativity goes,

Speaker 1

00:21:44 - 00:21:46

it does tend to repeat a lot of the same jokes.

Speaker 2

00:21:46 - 00:21:52

Go ahead and use GPT-4, see what it does. Look at this, GPT-4, why don't scientists trust atoms because they make up everything.

Speaker 1

00:21:52 - 00:21:58

That was 1 of the same exact jokes that we just got here in 3.5. Why did the scarecrow win an award? Because it was outstanding in

Speaker 2

00:21:58 - 00:22:00

his field. I've gotten that joke before too.

Speaker 1

00:22:00 - 00:22:02

2 antennas met on a roof, fell

Speaker 2

00:22:02 - 00:22:05

in love and got married. The ceremony wasn't much, but the reception was excellent." So,

Speaker 1

00:22:05 - 00:22:07

I mean, the jokes, meh.

Speaker 2

00:22:07 - 00:22:08

Let's see how Claude does.

Speaker 1

00:22:08 - 00:22:12

Here are 3 jokes I find pretty funny. I asked my dog, what's 2 minus 2? He said nothing.

Speaker 2

00:22:12 - 00:22:15

Why don't scientists trust atoms? Because they make up everything.

Speaker 1

00:22:15 - 00:22:20

Why was the math book sad? Because it had too many problems. I've gotten this joke before and this

Speaker 2

00:22:20 - 00:22:24

is literally the third large language model that gave us this same joke.

Speaker 1

00:22:24 - 00:22:32

Alright let's see what Bard gives us. Well I'm glad it didn't give us the Adam's making up everything joke but it did give us why did the scarecrow win an award? Because he was outstanding in his

Speaker 2

00:22:32 - 00:22:38

field which GPT-4 also gave us. What do you call a bear with no teeth? A gummy bear. It's not bad. Why did the bicycle fall over?

Speaker 2

00:22:38 - 00:22:39

Because it was too tired.

Speaker 1

00:22:39 - 00:22:47

Didn't we see that 1 already too? So these large language models must all be chained on the same exact jokes because We got some repeats across all of

Speaker 2

00:22:47 - 00:23:09

the various models again super subjective But if I had to rank these I would rank them like this jet GPT 4 was the best I actually thought it gave us the best poem jet GPT 3.5 I think gave us the second best poem Claude to gave us the third best poem Bard gave us the worst poem and the jokes are just a wash. Cause it literally just gave us all the same jokes, no matter which model I used. All right, let's test coding. I'm just

Speaker 1

00:23:09 - 00:23:17

going to do a real simple website, create a website using HTML and inline CSS, because I want to be able to just paste it all into 1 page.

Speaker 2

00:23:17 - 00:23:26

I don't want to have a separate CSS and HTML file that sells surf boards. The name of the company is surf Wolf. Sure.

Speaker 1

00:23:26 - 00:23:30

Let's go ahead and copy this code. I'm going to put it into a notepad file here. I'm going

Speaker 2

00:23:30 - 00:23:32

to save it as GPT35.html

Speaker 1

00:23:35 - 00:23:46

just to represent that this is using GPT 3.5. Click save. And when I open the website, it looks like this. Surf Wolf, surfboards for sale, and then the various prices. Pretty plain, basic website.

Speaker 1

00:23:46 - 00:24:02

It wants me to paste images into the code. I don't have any images of surfboards, but there would be surfboard images here if I had them. Let's just go ahead and see what GPT-4 does differently. 1 thing to note is GPT-4 is definitely taking a lot longer to write out the code. Let's go ahead and copy it.

Speaker 1

00:24:02 - 00:24:16

When we run this 1, it looks like this. Also, it doesn't have images, but there would be images there. So this is what GPT 3.5 generated. This is what GPT 4 generated. Let's see what happens when I turn on Code Interpreter and give it the same prompt.

Speaker 1

00:24:16 - 00:24:29

See if Code Interpreter does anything different with it. It pretty much made the exact same website when I use Code Interpreter. But can Code Interpreter create the images for me? Generate the surfboard images for me. So it actually generated 3 images for me.

Speaker 1

00:24:29 - 00:24:36

I downloaded each of them and put them in the same folder. So if I refresh this now, it actually generated the surfboard images for me.

Speaker 2

00:24:36 - 00:24:54

I mean, it just put colorful rectangles, but those are the surfboards that generated. Again, I'm doing this super fast. You can go back and forth with GPT 3.5 or GPT 4 and ask it to continually improve and get better and better. Let's just see how Claude does giving it this prompt. Let's go ahead and click copy code.

Speaker 2

00:24:54 - 00:25:00

And here's the page that Claude generated. It actually generated a little menu at the top. I mean, no, the links actually work, but it put

Speaker 1

00:25:00 - 00:25:10

a little menu. It actually puts some copy. Welcome to surf Wolf. We make high quality custom surf boards right here in California. Now compared to this, these other ones that didn't even actually add any sort of copy.

Speaker 1

00:25:10 - 00:25:23

It just says description of surfboard too. And this 1 actually put shortboard longboard fish. So it actually did the best so far in my opinion. Let's go ahead and see how Bard does. We use the same exact prompt and it generated some code.

Speaker 1

00:25:23 - 00:25:26

Let's go ahead and copy this and Bard. I mean, this

Speaker 2

00:25:26 - 00:25:27

is what we got from Bard.

Speaker 1

00:25:27 - 00:25:51

All said and done, all 4 are capable of generating HTML and CSS code. I actually think that of these, Quad did the best job of generating the HTML and CSS code because it did kind of put a menu up here and it did actually add some additional copy. All 4 of these though, you can actually get pretty decent code written. You just need to go back and forth with it and say, I'd like you to change this. I'd like you to center this.

Speaker 1

00:25:51 - 00:26:05

I'd like you to change the, if you just keep going back and forth with it, it will slowly dial in the code. So all of them are fairly capable. If I was to pick 1 that I would use more often right now, based on what I'm seeing here, I think Claude got me the closest to what

Speaker 2

00:26:05 - 00:26:07

I would want right from the very first iteration.

Speaker 1

00:26:07 - 00:26:28

So revisiting our chart here, all of them are pretty capable of simple HTML and CSS. Now let's move on to accuracy. We've already seen some accuracy issues when asking for how long a chunk of text was. Chat GPT 3.5 got the number wrong. Blot 2 also got the number wrong where chat GPT 4 using code interpreter finally got the number correct.

Speaker 1

00:26:28 - 00:26:49

So I actually found this interesting math problem that I think would be cool to test. It says if a zookeeper had 100 pairs of animals in her zoo and 2 pairs of babies are born for each 1 of the original animals, then sadly 23 animals don't survive. How many animals do you have left in total? Let's see if it can do this math. So the answer should be 977 animals.

Speaker 1

00:26:49 - 00:27:05

We'll copy this. Let's test it in GPT 3.5 and see if it can figure out this riddle. 100 pairs of animals. 2 pairs of babies are born for each original pair of animals, which means there are 2 babies for each original animal. 23 don't survive to calculate the number of animals, 100 pairs, 200 animals.

Speaker 1

00:27:05 - 00:27:35

So, so far so good. Each original animal has 2 babies. So we have 200 times 2, which is 400 babies total animals before any losses, 200 plus 400 babies equals 600 Animals lost 23. So it got 577 animals so it actually got this number wrong because if we look at the logic here 2 pairs of babies are born for Each 1 of the original animals, so there's 200 original animals 2 pairs would be 4 times 200 animals that's 800 so couldn't figure out this math problem. Let's go ahead and test GPT 4 see

Speaker 2

00:27:35 - 00:27:39

if it could figure out the logic. GPT 4 nailed it. See for each 1 of

Speaker 1

00:27:39 - 00:28:13

the original animals 2 pairs of babies are born that means each original animal gave birth to 2 times 2 which is 4 new animals So we have 200 times 4 equals 800 then sadly 23 don't survive this leaves us with 977 so chat GPT got this 1, right? Let's go ahead and see if Claude can answer this 1 though Claude seemed to mess this 1 up to the zookeeper originally had 100 pairs of animals So 200 animals total for each pair, 2 pairs of babies were born. So 2 times 100 equals 200. See, so it messed up the math already and came out to 377. Finally, let's go ahead and test Bard.

Speaker 1

00:28:13 - 00:28:17

Bard actually figured it out as well. 4 times 200 equals 800 23

Speaker 2

00:28:17 - 00:28:21

die 977 animals left. Now we know chat GPT's cutoff date is still

Speaker 1

00:28:23 - 00:28:38

2021. So let's ask a question like, who is the current CEO of Twitter? So GPT 3.5 thinks it's Jack Dorsey, which as of 2021 was probably accurate. Let's try GPT 4. Again, it says Jack Dorsey.

Speaker 1

00:28:38 - 00:28:45

Now, if we try the same question with Claude, let's go ahead and see how this 1 does. As of July, 2023, Elon

Speaker 2

00:28:45 - 00:28:48

Musk is the CEO of Twitter, which is actually inaccurate

Speaker 1

00:28:48 - 00:29:02

still because the current CEO of Twitter is Linda Iaccarino. And this even says as of July, 2023. So there's some inaccuracy there. Let's go ahead and ask Bard. I expect Bard will get it right because it can search Google.

Speaker 2

00:29:02 - 00:29:04

As of today, July 16th,

Speaker 1

00:29:04 - 00:29:23

2023, the current CEO of Twitter is Linda Iaccarino. So Bard gets this 1 right because it does have the ability to see the most recent up-to-date information. So here's my current finding. There's obviously A lot of other ways we can test for accuracy, but this is just meant to be a quick test. I find BARD to be the most accurate because it has the ability to search the web.

Speaker 1

00:29:23 - 00:29:38

Plus it also got the math problem correct. It sort of figured out the logic correctly. I found chat GPT 3.5 to be the least accurate. It got the math problem wrong. And it also has the cutoff date of 2021 where it still thought Jack Dorsey was the founder of Twitter.

Speaker 1

00:29:38 - 00:30:01

So it doesn't have any current up-to-date information beyond 2021. With Clawed 2, I put it as tied for second because it couldn't figure out the math problem. It also couldn't figure out who the current CEO of Twitter was. It says, as of July, it was Elon Musk, but Elon Musk is closer than saying that it was Jack Dorsey. Cause he's a more recent CEO of Twitter and then chat GPT 4, you know, it did get the math problem, right?

Speaker 1

00:30:01 - 00:30:31

So that was a plus for it but because it has that cutoff date of 2021 it's not very accurate. Now I would probably give it a slight edge above Clawd right now because with web browsing enabled you can get more accurate results. Now the default web browser inside of ChatGPT is currently inactive. You can sort of get web browsing with a plugin, which will give us more up to date accurate results. I kind of call it a draw between Cloud2 and ChatGPT 4 because they both have sort of pros and cons.

Speaker 1

00:30:31 - 00:31:07

And then finally, none of these chatbots are any good if you don't have them available to you. So as of this recording, chat GPT 3.5 and 4 are both available in 163 countries. BARD is available in 230 countries. And unfortunately, CLAWD 2 is only available in the US and the UK right now So if you're outside of the US or UK, you don't get clod 2 at the moment. So there you have it There's the breakdown of all 4 of these they all kind of have pros and cons right now If you ask me what I use, I'm pretty much switching off between chat GPT 4 and Claude 2, depending on what I'm doing.

Speaker 1

00:31:07 - 00:31:47

If I'm trying to summarize long content like PDFs or long webpage articles, I'm going to Claude for that. If I wanted to summarize a YouTube video, I'll probably use chat GPT for using the YouTube summarizer plugin, which works really, really well right now. The image recognition just isn't really great across the board for either chat GPT for or barred. So I haven't really used it for much of that. And for data analysis, I'd probably go with chat GPT 4 for right now for that, just because it does have code interpreter, which if you need it to figure something out, it will actually write the code to try to figure it out for you and then run the code to then figure it out, which is a little bit more capable than what Claude is able to do for coding right now.

Speaker 1

00:31:47 - 00:32:09

I think Claude was the best, at least with basic HTML and CSS. And as far as accuracy, if you're going to ask it questions to try to get an accurate response, I mean, based on my very limited tests here, Bard is probably the best with accuracy. So again, they all have their pros and cons. If you ask me what I've been using lately, ChatGPT4 and Clod 2. I've been kind of using those interchangeably.

Speaker 1

00:32:09 - 00:32:42

I've been using Clod 2 a lot lately to help me sort of refine articles and write outlines and write summaries and bullet points for longer form content. It's been great for that. This video is already a lot longer than I wanted it to be, but I wanted to do a sort of breakdown and compare the 4. I will probably do further comparisons in the future and deep dive on individual sections that I covered in this deep dive. For instance, doing a better analysis of coding between the 4, getting to write JavaScript and Python code and things like that, as well as deeper dives on data analysis.

Speaker 1

00:32:42 - 00:33:01

That 1 sort of comparison could deserve its own video as well. I'd love to hear from you in the comments. What area would you like to see a deeper dive of in the comparison? Because it's really hard to find all of the potential things I can compare against each other. And then also go into depth in each 1 of those areas.

Speaker 1

00:33:01 - 00:33:09

I kind of have to stay very surface level to be able to compare a lot of various criteria, which leaves a bit to be desired when you're really trying to figure out which 1 is

Speaker 2

00:33:09 - 00:33:12

the best. Now the pro of all this is 3 out of the 4 of

Speaker 1

00:33:12 - 00:33:30

them you can use for free. So if budget is the problem, you do have Claude too. If you're in the U S or UK, you do have Bard pretty much everywhere and you do have chat GPT 3.5 assuming that there's not a heavy load and free users are able to use it when you try. So there you have it that's my breakdown. Hopefully you enjoyed this video.

Speaker 1

00:33:30 - 00:33:51

I will be doing deeper dives again. Hopefully I can make more granular videos breaking down individual areas of criteria and helping you make decisions for certain specific needs like coding. So to make sure you see those videos when I create them, maybe subscribe to this channel and if you want more AI videos like this 1, give this video a thumbs up. That'll make sure more videos like this show up in your YouTube feed. If you haven't already, check out futuretools.io.

Speaker 1

00:33:52 - 00:33:54

This is where I share all the coolest tools that I come across.

Speaker 2

00:33:54 - 00:33:58

I keep it up to date with all the latest AI news every single day.

Speaker 1

00:33:58 - 00:34:15

And if you just want the TLDR for the week, Every Friday I'll send you a newsletter that breaks down everything you might have missed in AI with all the coolest tools, all the news, a few YouTube videos, and 1 cool way to make money with AI. All you got to do is head over to futuretools.io and join the free newsletter. Once again thank you so much for tuning into this video. I really really appreciate you. I'll see you in

Speaker 2

00:34:15 - 00:34:17

the next video. Bye bye.

Speaker 1

00:34:17 - 00:34:17

โ™ชโ™ชโ™ช