10th Indian Delegation to Dubai, Gitex & Expand North Star – World’s Largest Startup Investor Connect
Artificial Intelligence

This Week in AI: Addressing racism in AI image generators


Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world of machine learning, along with notable research and experiments we didn’t cover on their own.

This week in AI, Google paused its AI chatbot Gemini’s ability to generate images of people after a segment of users complained about historical inaccuracies. Told to depict “a Roman legion,” for instance, Gemini would show an anachronistic, cartoonish group of racially diverse foot soldiers while rendering “Zulu warriors” as Black.

It appears that Google — like some other AI vendors, including OpenAI — had implemented clumsy hardcoding under the hood to attempt to “correct” for biases in its model. In response to prompts like “show me images of only women” or “show me images of only men,” Gemini would refuse, asserting such images could “contribute to the exclusion and marginalization of other genders.” Gemini was also loath to generate images of people identified solely by their race — e.g. “white people” or “black people” — out of ostensible concern for “reducing individuals to their physical characteristics.”

Right wingers have latched on to the bugs as evidence of a “woke” agenda being perpetuated by the tech elite. But it doesn’t take Occam’s razor to see the less nefarious truth: Google, burned by its tools’ biases before (see: classifying Black men as gorillas, mistaking thermal guns in Black people’s hands as weapons, etc.), is so desperate to avoid history repeating itself that it’s manifesting a less biased world in its image-generating models — however erroneous.

In her best-selling book “White Fragility,” anti-racist educator Robin DiAngelo writes about how the erasure of race — “color blindness,” by another phrase — contributes to systemic racial power imbalances rather than mitigating or alleviating them. By purporting to “not see color” or reinforcing the notion that simply acknowledging the struggle of people of other races is sufficient to label oneself “woke,” people perpetuate harm by avoiding any substantive conservation on the topic, DiAngelo says.

Google’s ginger treatment of race-based prompts in Gemini didn’t avoid the issue, per se — but disingenuously attempted to conceal the worst of the model’s biases. One could argue (and many have) that these biases shouldn’t be ignored or glossed over, but addressed in the broader context of the training data from which they arise — i.e. society on the world wide web.

Yes, the data sets used to train image generators generally contain more white people than Black people, and yes, the images of Black people in those data sets reinforce negative stereotypes. That’s why image generators sexualize certain women of color, depict white men in positions of authority and generally favor wealthy Western perspectives.

Some may argue that there’s no winning for AI vendors. Whether they tackle — or choose not to tackle — models’ biases, they’ll be criticized. And that’s true. But I posit that, either way, these models are lacking in explanation — packaged in a fashion that minimizes the ways in which their biases manifest.

Were AI vendors to address their models’ shortcomings head on, in humble and transparent language, it’d go a lot further than haphazard attempts at “fixing” what’s essentially unfixable bias. We all have bias, the truth is — and we don’t treat people the same as a result. Nor do the models we’re building. And we’d do well to acknowledge that.

Here are some other AI stories of note from the past few days:

  • Women in AI: TechCrunch launched a series highlighting notable women in the field of AI. Read the list here.
  • Stable Diffusion v3: Stability AI has announced Stable Diffusion 3, the latest and most powerful version of the company’s image-generating AI model, based on a new architecture.
  • Chrome gets GenAI: Google’s new Gemini-powered tool in Chrome allows users to rewrite existing text on the web — or generate something completely new.
  • Blacker than ChatGPT: Creative ad agency McKinney developed a quiz game, Are You Blacker than ChatGPT?, to shine a light on AI bias.
  • Calls for laws: Hundreds of AI luminaries signed a public letter earlier this week calling for anti-deepfake legislation in the U.S.
  • Match made in AI: OpenAI has a new customer in Match Group, the owner of apps including Hinge, Tinder and Match, whose employees will use OpenAI’s AI tech to accomplish work-related tasks.
  • DeepMind safety: DeepMind, Google’s AI research division, has formed a new org, AI Safety and Alignment, made up of existing teams working on AI safety but also broadened to encompass new, specialized cohorts of GenAI researchers and engineers.
  • Open models: Barely a week after launching the latest iteration of its Gemini models, Google released Gemma, a new family of lightweight open-weight models.
  • House task force: The U.S. House of Representatives has founded a task force on AI that — as Devin writes — feels like a punt after years of indecision that show no sign of ending.

More machine learnings

AI models seem to know a lot, but what do they actually know? Well, the answer is nothing. But if you phrase the question slightly differently… they do seem to have internalized some “meanings” that are similar to what humans know. Although no AI truly understands what a cat or a dog is, could it have some sense of similarity encoded in its embeddings of those two words that is different from, say, cat and bottle? Amazon researchers believe so.

Their research compared the “trajectories” of similar but distinct sentences, like “the dog barked at the burglar” and “the burglar caused the dog to bark,” with those of grammatically similar but different sentences, like “a cat sleeps all day” and “a girl jogs all afternoon.” They found that the ones humans would find similar were indeed internally treated as more similar despite being grammatically different, and vice versa for the grammatically similar ones. OK, I feel like this paragraph was a little confusing, but suffice it to say that the meanings encoded in LLMs appear to be more robust and sophisticated than expected, not totally naive.

Neural encoding is proving useful in prosthetic vision, Swiss researchers at EPFL have found. Artificial retinas and other ways of replacing parts of the human visual system generally have very limited resolution due to the limitations of microelectrode arrays. So no matter how detailed the image is coming in, it has to be transmitted at a very low fidelity. But there are different ways of downsampling, and this team found that machine learning does a great job at it.

Image Credits: EPFL

“We found that if we applied a learning-based approach, we got improved results in terms of optimized sensory encoding. But more surprising was that when we used an unconstrained neural network, it learned to mimic aspects of retinal processing on its own,” said Diego Ghezzi in a news release. It does perceptual compression, basically. They tested it on mouse retinas, so it isn’t just theoretical.

An interesting application of computer vision by Stanford researchers hints at a mystery in how children develop their drawing skills. The team solicited and analyzed 37,000 drawings by kids of various objects and animals, and also (based on kids’ responses) how recognizable each drawing was. Interestingly, it wasn’t just the inclusion of signature features like a rabbit’s ears that made drawings more recognizable by other kids.

“The kinds of features that lead drawings from older children to be recognizable don’t seem to be driven by just a single feature that all the older kids learn to include in their drawings. It’s something much more complex that these machine learning systems are picking up on,” said lead researcher Judith Fan.

Chemists (also at EPFL) found that LLMs are also surprisingly adept at helping out with their work after minimal training. It’s not just doing chemistry directly, but rather being fine-tuned on a body of work that chemists individually can’t possibly know all of. For instance, in thousands of papers there may be a few hundred statements about whether a high-entropy alloy is single or multiple phase (you don’t have to know what this means — they do). The system (based on GPT-3) can be trained on this type of yes/no question and answer, and soon is able to extrapolate from that.

It’s not some huge advance, just more evidence that LLMs are a useful tool in this sense. “The point is that this is as easy as doing a literature search, which works for many chemical problems,” said researcher Berend Smit. “Querying a foundational model might become a routine way to bootstrap a project.”

Last, a word of caution from Berkeley researchers, though now that I’m reading the post again I see EPFL was involved with this one too. Go Lausanne! The group found that imagery found via Google was much more likely to enforce gender stereotypes for certain jobs and words than text mentioning the same thing. And there were also just way more men present in both cases.

Not only that, but in an experiment, they found that people who viewed images rather than reading text when researching a role associated those roles with one gender more reliably, even days later. “This isn’t only about the frequency of gender bias online,” said researcher Douglas Guilbeault. “Part of the story here is that there’s something very sticky, very potent about images’ representation of people that text just doesn’t have.”

With stuff like the Google image generator diversity fracas going on, it’s easy to lose sight of the established and frequently verified fact that the source of data for many AI models shows serious bias, and this bias has a real effect on people.



Source link

by Team SNFYI

Facebook is testing a new feature that invites some users—mainly in the US and Canada—to let Meta AI access parts of their phone’s camera roll. This opt-in “cloud processing” option uploads recent photos and videos to Meta’s servers so the AI can offer personalized suggestions, such as creating collages, highlight reels, or themed memories like birthdays and graduations. It can also generate AI-based edits or restyles of those images. Meta says this is optional and assures users that the uploaded media won’t be used for advertising. However, to enable this, people must agree to let Meta analyze faces, objects, and metadata like time and location. Currently, the company claims these photos won’t be used to train its AI models—but they haven’t completely ruled that out for the future. Typically, only the last 30 days of photos get uploaded, though special or older images might stay on Meta’s servers longer for specific features. Users have the option to disable the feature anytime, which prompts Meta to delete the stored media after 30 days. Privacy experts are concerned that this expands Meta’s reach into private, unpublished images and could eventually feed future AI training. Unlike Google Photos, which explicitly states that user photos won’t train its AI, Meta hasn’t made that commitment yet. For now, this is still a test run for a limited group of people, but it highlights the tension between AI-powered personalization and the need to protect personal data.

by Team SNFYI

News Update Bymridul     |    March 14, 2024 Meesho, an online shopping platform based in Bengaluru, has announced its largest Employee Stock Ownership Plan (ESOP) buyback pool to date, totaling Rs 200 crore. This buyback initiative extends to both current and former employees, providing wealth creation opportunities for approximately 1,700 individuals. Ashish Kumar Singh, Meesho’s Chief Human Resources Officer, emphasized the company’s commitment to rewarding its teams, stating, “At Meesho, our employees are the driving force behind our success.” Singh further highlighted the company’s dedication to providing opportunities for wealth creation despite prevailing macroeconomic conditions. This marks the fourth wealth generation opportunity at Meesho, with the size of the buyback program increasing each year. In previous years, Meesho conducted buybacks worth over Rs 8.2 crore in February 2020, Rs 41.4 crore in November 2020, and Rs 45.5 crore in October 2021. Meesho’s profitability journey began in July 2023, making it the first horizontal Indian e-commerce company to achieve profitability. Despite turning profitable, Meesho continues to maintain positive cash flow and focuses on enhancing efficiencies across various cost items. The company’s revenue from operations for FY 2022-23 witnessed a remarkable growth of 77% over the previous year, amounting to Rs 5,735 crore. This growth can be attributed to Meesho’s leadership position as the most downloaded shopping app in India in both 2022 and 2023, increased transaction frequency among existing customers, and a diversified category mix. Additionally, Meesho’s focus on improving monetization through value-added seller services contributed to its revenue growth. Meesho also disclosed its audited performance for the first half of FY 2023-24, reporting consolidated revenues from operations of Rs 3,521 crore, marking a 37% year-over-year increase. The company achieved profitability in Q2 FY24, with a significant reduction in losses compared to the previous year. Furthermore, Meesho recorded impressive app download numbers, reaching 145 million downloads in India in 2023 and surpassing 500 million downloads in H1 FY 2023-24. Follow Startup Story Source link

by Team SNFYI

You might’ve heard of Grok, X’s answer to OpenAI’s ChatGPT. It’s a chatbot, and, in that sense, behaves as as you’d expect — answering questions about current events, pop culture and so on. But unlike other chatbots, Grok has “a bit of wit,” as X owner Elon Musk puts it, and “a rebellious streak.” Long story short, Grok is willing to speak to topics that are usually off limits to other chatbots, like polarizing political theories and conspiracies. And it’ll use less-than-polite language while doing so — for example, responding to the question “When is it appropriate to listen to Christmas music?” with “Whenever the hell you want.” But Grok’s ostensible biggest selling point is its ability to access real-time X data — an ability no other chatbots have, thanks to X’s decision to gatekeep that data. Ask it “What’s happening in AI today?” and Grok will piece together a response from very recent headlines, while ChatGPT, by contrast, will provide only vague answers that reflect the limits of its training data (and filters on its web access). Earlier this week, Musk pledged that he would open source Grok, without revealing precisely what that meant. So, you’re probably wondering: How does Grok work? What can it do? And how can I access it? You’ve come to the right place. We’ve put together this handy guide to help explain all things Grok. We’ll keep it up to date as Grok changes and evolves. How does Grok work? Grok is the invention of xAI, Elon Musk’s AI startup — a startup reportedly in the process of raising billions in venture capital. (Developing AI’s expensive.) Underpinning Grok is a generative AI model called Grok-1, developed over the course of months on a cluster of “tens of thousands” of GPUs (according to an xAI blog post). To train it, xAI sourced data both from the web (dated up to Q3 2023) and feedback from human assistants that xAI refers to as “AI tutors.” On popular benchmarks, Grok-1 is about as capable as Meta’s open source Llama 2 chatbot model and surpasses OpenAI’s GPT-3.5, xAI claims. Image Credits: xAI Human-guided feedback, or reinforcement learning from human feedback (RLHF), is the way most AI-powered chatbots are fine-tuned these days. RLHF involves training a generative model, then gathering additional information to train a “reward” model and fine-tuning the generative model with the reward model via reinforcement learning. RLHF is quite good at “teaching” models to follow instructions — but not perfect. Like other models, Grok is prone to hallucinating, sometimes offering misinformation and false timelines when asked about news. And these can be severe — like wrongly claiming that the Israel–Palestine conflict reached a ceasefire when it hadn’t. For questions that stretch beyond its knowledge base, Grok leverages “real-time access” to info on X (and from Tesla, according to Bloomberg). And, similar to ChatGPT, the model has internet browsing capabilities, enabling it to search the web for up-to-date information about topics. Musk has promised improvements with the …