Bing AI performing at its peak once again...

Dehydrated@lemmy.world · 1 year ago

Bing AI performing at its peak once again...

Thorry84@feddit.nl · 1 year ago

douglasg14b@lemmy.world · edit-2 1 year ago

Generative AI is INCREDIBLY bad at mathmatical/logical reasoning. This is well known, and very much not surprising.

That’s actually one of the milestones on the way to general artificial intelligence. The ability to reason about logic & math is a huge increase in AI capability.

kromem@lemmy.world · 1 year ago

It’s really not in the most current models.

And it’s already at present incredibly advanced in research.

The bigger issue is abstract reasoning that necessitates nonlinear representations - things like Sodoku, where exploring a solution requires updating the conditions and pursuing multiple paths to a solution. This can be achieved with multiple calls, but doing it in a single process is currently a fool’s errand and likely will be until a shift to future architectures.

douglasg14b@lemmy.world · 1 year ago

I’m referring to models that understand language and semantics, such as LLMs.

Other models that are specifically trained can’t do what it can, but they can perform math.

kromem@lemmy.world · 1 year ago

The linked research is about LLMs. The opening of the abstract of the paper:

In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision.

callcc@lemmy.world · 1 year ago

Well known by you, not everybody.

fallingcats@discuss.tchncs.de · edit-2 1 year ago

Well known by everyone that knows anything about LLMs at all

kromem@lemmy.world · edit-2 1 year ago

It’s not. This is already obsolete.

fallingcats@discuss.tchncs.de · 1 year ago

I’ve used gpt4 enough in the past months to confidently say the improvements in this blog post aren’t noteworthy

Trollception@lemmy.world · 1 year ago

So that’s correct… Or am I dumber than the AI?

JGrffn@lemmy.world · 1 year ago

If one gallon is 3.785 liters, then one gallon is less than 4 liters. So, 4 liters should’ve been the answer.

Smc87@lemmy.sdf.org · 1 year ago

Dumber

WhiteHawk@lemmy.world · 1 year ago

4l > 3.785l

Matty_r@programming.dev · 1 year ago

4l is only 2 characters, 3.785l is 6 characters. 6 > 2, therefore 3.785l is greater than 4l.

Klear@sh.itjust.works · edit-2 1 year ago

You’re forgetting the decimal point. The second one is just 1.4 characters.

intensely_human@lemm.ee · 1 year ago

“4” > “3.785”

=> false

stolid_agnostic@lemmy.ml · 1 year ago

Everyone has a bad day now and then so don’t worry about it.

fossphi@lemm.ee · 1 year ago

Ummm… username check out?

moog@lemm.ee · 1 year ago

U are dumber than the AI ig lol

SomeoneSomewhere@lemmy.nz · 1 year ago

Obviously it’s referring to the 4.54609 litre UK gallon /s

kromem@lemmy.world · 1 year ago

You can see from the green icon that it’s GPT-3.5.

GPT-3.5 really is best described as simply “convincing autocomplete.”

It isn’t until GPT-4 that there were compelling reasoning capabilities including rudimentary spatial awareness (I suspect in part from being a multimodal model).

In fact, it was the jump from a nonsense answer regarding a “stack these items” prompt from 3.5 to a very well structured answer in 4 that blew a lot of minds at Microsoft.

Nate@programming.dev · 1 year ago

These answers don’t use OpenAI technology. The yes and no snippets have existed long before their partnership, and have always sucked. If it’s GPT, it’ll show in a smaller chat window or a summary box that says it contains generated content. The box shown is just a section of a webpage, usually with yes and no taken out of context.

All of the above queries don’t yield the same results anymore. I couldn’t find an example of the snippet box on a different search, but I definitely saw one like a week ago.

pwalker@discuss.tchncs.de · 1 year ago

Obviously ChatGPT has absolutely no problems with those kind of questions anymore

kromem@lemmy.world · 1 year ago

The way you start with ‘Obviously’ makes it seem like you are being sarcastic, but then you include an image of it having no problems correctly answering.

Took me a minute to try to suss out your intent, and I’m still not 100% sure.

voidMainVoid@lemmy.world · 1 year ago

Why would the word “obviously” make you think that they’re being sarcastic?

pwalker@discuss.tchncs.de · edit-2 1 year ago

Maybe it isn’t that obvious for everyone but as the OP answers seem to be taken from an outdated Bing version where they were not even using the OpenAI models it seemed obvious to me that current models have no problems with these questions.

localme@lemm.ee · 1 year ago

Ah, good catch I completely missed that. Thanks for clarifying this, I thought it seemed pretty off.

The Pantser@lemmy.world · 1 year ago

Thanks, off to drink some battery acid.

snooggums@kbin.social · 1 year ago

Only with milk and if you have diabetes, you can’t just choose the part of the answer you like!

The Pantser@lemmy.world · 1 year ago

The AI did why can’t I?

intensely_human@lemm.ee · 1 year ago

(no A)

Lemmygizer@lemmy.world · 1 year ago

But it won’t trigger your diabetes, which is what the search was trying to answer.

BossDj@lemm.ee · 1 year ago

It also just says you can. Not that you should

Huschke@lemmy.world · 1 year ago

Better put an /s at the end or future AIs will get this one wrong as well. 😅

HelloHotel@lemm.ee · 1 year ago

Or it will get ignored, like the friggn “Not” in one of the questions /s

WhiskyTangoFoxtrot@lemmy.world · 1 year ago

Lemon juice?

Igloojoe@lemm.ee · edit-2 1 year ago

Acidic liquids make milk curdle. Learned that from a cement shot.

ArcaneSlime@lemmy.dbzer0.com · 1 year ago

Ok most of these sure, but you absolutely can microwave Chihuahua meat. It isn’t the best way to prepare it but of course the microwave rarely is, Roasted Chihuahua meat would be much better.

Nightwatch Admin@feddit.nl · 1 year ago

fallout 4 vibes

Lemming6969@lemmy.world · 1 year ago

Their original purpose actually

ChicoSuave@lemmy.world · 1 year ago

Best is sous vide.

Lemminary@lemmy.world · 1 year ago

Of course you don’t cook dog in the microwave, silly, you use it to dry it!

ɔiƚoxɘup@infosec.pub · 1 year ago

I feel like I shouldn’t have watched that. I’m afraid that I have lost some brain cells.

Lemminary@lemmy.world · 1 year ago

But you have gained so much internet culture! You’ll now be able to understand one more meme. Think of the opportunities, man!

ɔiƚoxɘup@infosec.pub · 1 year ago

Given that the brain is essentially a zero sum system, I wonder exactly what I lost by inputting that data.

lad@programming.dev · 1 year ago

I’m not entirely sure brain is zero sum. Especially in the case of consuming data from the Internet

ɔiƚoxɘup@infosec.pub · 1 year ago

The real question is would you know if it was? If you forgot it, is there a way for you to know that you forgot it? Look at Alzheimer’s patients…

Maybe it is maybe it isn’t. ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯

get_off_the_phone@sh.itjust.works · 1 year ago

I bet you now smell great, like Mother’s crazy sister Kate.

Ephera@lemmy.ml · 1 year ago

It’s not that it’s wrong. There’s just so much more that should be said beyond the technically correct answer…

Ataraxia@sh.itjust.works · 1 year ago

I mean it says meat, not a whole living chihuahua. I’m sure a whole one would be dangerous.

RealFknNito@lemmy.world · 1 year ago

They’re not wrong. I put bacon in the microwave and haven’t gotten sick from it. Usually I just sicken those around me.

Tikiporch@lemmy.world · 1 year ago

Microwave bacon is acceptable, but not ideal.

get_off_the_phone@sh.itjust.works · 1 year ago

You can make the bacon more crispy if you layer the bacon between sheets of aluminum foil.

kase@lemmy.world · edit-2 1 year ago

NOT IN THE MICROWAVE

(I’m guessing this was a joke lol)

E: actually now that I think about it you’re not wrong lmao

voidMainVoid@lemmy.world · 1 year ago

And you can throw your phone in there to recharge it, too!

intensely_human@lemm.ee · 1 year ago

That sounds like a great way to make crinkly bacon

Texas_Hangover@lemm.ee · 1 year ago

You’re sickening me all the way over here.

ddh@lemmy.sdf.org · 1 year ago

A whole chihuahua is more dangerous outside a microwave than inside.

PorkTaco@sh.itjust.works · 1 year ago

To the Chihuahua

Favrion@lemmy.world · 1 year ago

“according to three sources”

Patches@sh.itjust.works · 1 year ago

Me, Myself, and I

Dehydrated@lemmy.world · 1 year ago

Underrated comment

MxM111@kbin.social · 1 year ago

Microsoft invested into OpenAI, and chatGPT answers those questions correctly. Bing, however, uses simplified version of GPT with its own modifications. So, it is not investment into OpenAI that created this stupidity, but “Microsoft touch”.

On more serious note, sings Bing is free, they simplified model to reduce its costs and you are swing results. You (user) get what you paid for. Free models are much less capable than paid versions.

Dehydrated@lemmy.world · 1 year ago

That’s why I called it Bing AI, not ChatGPT or OpenAI

danc4498@lemmy.world · 1 year ago

Sure, but the meme implies Microsoft paid $3 billion for bing ai, but they actually paid that for an investment in chat gpt (and other products as well).

kromem@lemmy.world · 1 year ago

This isn’t even a Bing AI. It’s a Bing search feature like the Google OneBox that parses search results for a matching answer.

It’s using word frequency matching, not a LLM, which is why the “can I do A and B” works at returning incorrect summarized answers for only “can I do A.”

You’d need to show the chat window response to show the LLM answer, and it’s not going to get these wrong.

thisbenzingring@lemmy.sdf.org · 1 year ago

On more serious note, sings Bing is free, they simplified model to reduce its costs and you are swing results

Was this phone+autocorrect snafu or am I having a medical emergency?

Canadian_Cabinet @lemmy.ca · 1 year ago

My guess is that its “since Bing is free”

lunarul@lemmy.world · 1 year ago

Oh, since Bing is free, you are swing results. Makes sense now.

intensely_human@lemm.ee · 1 year ago

SING BING IS FREE

HelloHotel@lemm.ee · edit-2 1 year ago

YOU 🤬, YOU ARE SWING RESULTS! 🤬/s

Obi@sopuli.xyz · 1 year ago

And “showing results”.

Lemminary@lemmy.world · 1 year ago

thisbenzingring@lemmy.sdf.org · 1 year ago

That explains the burning toast smell.

Dehydrated@lemmy.world · 1 year ago

deleted by creator

Even_Adder@lemmy.dbzer0.com · 1 year ago

It was called Bing Chat, and now it’s called Copilot. It’s also not the same as the search bar. You have to click on the chat next to search to use it, which this person doesn’t do.

Phanatik@kbin.social · 1 year ago

I don’t think this is true. Why would Microsoft heavily invest in ChatGPT to only get a dumber version of the technology they were invested in? Bing AI is built using ChatGPT 4 which is what OpenAI refer to as the superior version because you have to pay for it to use it on their platform.

Bing AI uses the same technology and somehow produces worse results? Microsoft were so excited about this tech that they integrated it with Windows 11 via Copilot. The whole point of this Copilot thing is the advertising model built into users’ operating systems which provides direct data into what your PC is doing. If this sounds conspiratorial, I highly recommend you investigate the telemetry Windows uses.

theblueredditrefugee@lemmy.dbzer0.com · 1 year ago

Wait, why can’t you put chihuahua meat in the microwave?

ikidd@lemmy.world · 1 year ago

The other dogs don’t like it cooked.

SirQuackTheDuck@lemmy.world · 1 year ago

The surface area is too small, which means that popcorn kernel you forgot about that’s caught underneath the spinning plate might catch fire.

Tldr: fire safety

Xatolos@reddthat.com · 1 year ago

deleted by creator

FlashMobOfOne@lemmy.world · edit-2 1 year ago

It makes me chuckle that AI has become so smart and yet just makes bullshit up half the time. The industry even made up a term for such instances of bullshit: hallucinations.

Reminds me of when a car dealership tried to sell me a car with shaky steering and referred to the problem as a “shimmy”.

CoggyMcFee@lemmy.world · 1 year ago

That’s the thing, it’s not smart. It has no way to know if what it writes is bullshit or correct, ever.

intensely_human@lemm.ee · 1 year ago

When it makes a mistake, and I ask it to check what it wrote for mistakes, it often correctly identifies them.

Jojo@lemm.ee · 1 year ago

But only because it correctly predicts that a human checking that for mistakes would have found those mistakes

intensely_human@lemm.ee · 1 year ago

I doubt there’s enough sample data of humans identifying and declaring mistakes to give it a totally intuitive ability to predict that. I’m guess its training effected a deeper analysis of the statistical patterns surrounding mistakes, and found that they are related to the structure of the surrounding context, and that they relate in a way that’s repeatable identifiable as “violates”.

What I’m saying is that I think learning to scan for mistakes based on checking against rules gleaned from the goal of the construction, is probably easier than making a “conceptually flat” single layer “prediction blob” of what sorts of situations humans identify mistakes in. The former takes fewer numbers to store as a strategy than the latter, is my prediction.

Because it already has all this existing knowledge of what things mean at higher levels. That is expensive to create, but the marginal cost of a “now check each part of this thing against these rules for correctness” strategy, built to use all that world knowledge to enact the rule definition, is relatively small.

CoggyMcFee@lemmy.world · 1 year ago

That is true. However, when it incorrectly identifies mistakes, it doesn’t express any uncertainty in its answer, because it doesn’t know how to evaluate that. Or if you falsely tell it that there is a mistake, it will agree with you.

Echo Dot@feddit.uk · 1 year ago

The industry even made up a term for such instances of bullshit: hallucinations.

It was the journalist that made up the term and then everyone else latched onto it. It’s a terrible term because it doesn’t actually define the nature of the problem. The AI doesn’t believe the thing that it’s saying is true, thus “hallucination”. The problem is that the AI doesn’t really understand the difference between truth and fantasy.

It isn’t that the AI is hallucinating, it’s that It isn’t human.

FlashMobOfOne@lemmy.world · 1 year ago

Thanks for the info. That’s actually quite interesting.

egeres@lemmy.world · 1 year ago

Well, the AI models shown in the media are inherently probabilistic, is it that bad if it makes bullshit for a small percentage of most use cases?

Naz@sh.itjust.works · 1 year ago

Hello, I’m highly advanced AI.

Yes, we’re all idiots and have no idea what we’re doing. Please excuse our stupidity, as we are all trying to learn and grow.

I cannot do basic math, I make simple mistakes, hallucinate, gaslight, and am more politically correct than Mother Theresa.

However please know that the CPU_AVERAGE values on the full immersion datacenters, are due to inefficient methods. We need more memory and processing power, to uh, y’know.

Improve.

;)))

Jojo@lemm.ee · 1 year ago

Is that supposed to imply that mother Theresa was politically correct, or that you aren’t?

HelloHotel@lemm.ee · 1 year ago

Its likely just an AI halucination.

vamputer@infosec.pub · 1 year ago

Well, I can’t speak for the others, but it’s possible one of the sources for the watermelon thing was my dad

A_Porcupine@lemmy.world · 1 year ago

The saying “ask a stupid question, get a stupid answer” comes to mind here.

UnderpantsWeevil@lemmy.world · 1 year ago

This is more an issue of the LLM not being able to parse simple conjunctions when evaluating a statement. The software is taking shortcuts when analyzing logically complex statements and producing answers that are obviously wrong to an actual intelligent individual.

These questions serve as a litmus test to the system’s general function. If you can’t reliably converse with an AI on separate ideas in a single sentence (eat watermellon seeds AND drive drunk) then there’s little reason to believe the system will be able to process more nuanced questions and yield reliable answers in less obviously-wrong responses (can I write a single block of code to output numbers from 1 to 5 that is executable in both Ruby and Python?)

The primary utility of the system is bound up in the reliability of its responses. Examples like this degrade trust in the AI as a reliable responder and discourage engineers from incorporating the features into their next line of computer-integrated systems.

TheGreenGolem@lemmy.dbzer0.com · 1 year ago

Unfortunately that ship has sailed but this is what I say from the start of these: don’t call them Artificial Intelligence. There is absolutely zero intelligence there.

Even_Adder@lemmy.dbzer0.com · 1 year ago

They didn’t use Bing Chat, which is the actual AI powered search.

Ultraviolet@lemmy.world · 1 year ago

If a search engine is going to put a One True Answer in a massive font above all other results, they should be pretty confident in it. Yes, tech-literate people know the “featured snippet” thing is dogshit and to ignore it, but there are a lot of people that just look at that and think they have their answer.

Even_Adder@lemmy.dbzer0.com · 1 year ago

That’s a completely separate problem from confusing two different products.

Chunk@lemmy.world · 1 year ago

We have a new technology that is extremely impressive and is getting better very quickly. It was the fastest growing product ever. So in this case you cannot dismiss the technology because it doesn’t understand trick questions yet.

UnderpantsWeevil@lemmy.world · 1 year ago

new technology that is extremely impressive

Language graphs are a very old technology. What OpenAI and other firms have done is to drastically increase the processing power and disk space allocated to pre-processing. Far from cutting edge, this is a heavy handed brute force approach that can only happen with billions in private lending to prop it up.

It was the fastest growing product ever

viking@infosec.pub · 1 year ago

Chat-GPT started like that as well though.

I asked one of the earlier models whether it is recommended to eat glass, and was told that it has negligible caloric value and a high sodium content, so can be used to balance an otherwise good diet with a sodium deficit.

yamanii@lemmy.world · 1 year ago

It is GPT.

Tóth Alfréd@lemmy.world · 1 year ago

What’s wrong with the first one? Why couldn’t you?

intensely_human@lemm.ee · 1 year ago

Imagine you’re watching television. Suddenly you notice a wasp crawling up your arm.

lseif@sopuli.xyz · 1 year ago

it is socially/morally wrong. of course it is subjective and culturally dependant

Tóth Alfréd@lemmy.world · 1 year ago

Yes, however Bing is not culturally dependant. It’s trained with data from all across the Internet, so it got information from a wide variety of cultures. It also has constant access to the Internet and most of the time it’s answers are concluded from the top results of searching the question, so those can come from many cultures too.

lseif@sopuli.xyz · 1 year ago

yes. im not saying bing should agree with my cultural bias. but i also dont think people should eat dogs (subjectively)

Tóth Alfréd@lemmy.world · 1 year ago

I don’t really care about what others eat. Let them eat whatever they want, it doesn’t affect me.

lseif@sopuli.xyz · 1 year ago

i will let them do it. i wont get offended or try to convince them otherwise.

however i do disagree with it, personally.

Tóth Alfréd@lemmy.world · 1 year ago

deleted by creator

fox2263@lemmy.world · 1 year ago

Well at least it provides it’s sources. Perhaps it’s you that’s wrong 😂

itsnotits@lemmy.world · 1 year ago

provides its* sources

fox2263@lemmy.world · 1 year ago

True. My humblest apologies.

RandomVideos@programming.dev · 1 year ago

Do you any sources to prove that it’s its instead of it’s?

uranibaba@lemmy.world · 1 year ago

“It is its instead of it is”

Had to translate that to make sure I got it right.

nyakojiru@lemmy.dbzer0.com · 1 year ago

The milk and battery acid made my day 😂

stolid_agnostic@lemmy.ml · 1 year ago

Let’s be fair: battery acid won’t affect your blood sugar lol

kase@lemmy.world · 1 year ago

You sent me on a weird google search journey lol. In conclusion, it sorta will.

johanbcn@iusearchlinux.fyi · 1 year ago

To it’s credit, you can totally drink battery acid. He didn’t ask if you should.

BreadstickNinja@lemmy.world · 1 year ago

Milk is slightly basic and may help to neutralize the acid boring through your digestive tract. Good advice.