AI Sees What? The Good, the Bad, and the Ugly of Machine Vision for Museum Collections

Brendan Ciecko, Cuseum, USA


Recently, as artificial intelligence (AI) has become more widespread and accessible, museums have begun to make use of this technology. One tool in particular, machine vision, has made a considerable splash in museums in recent years. Machine vision is the ability for computers to understand what they are seeing. Although the application of machine vision to museums is still in its early stages, the results show promise. In this session, we will explore the strengths and successes of this new technology, as well as the areas of concern and ethical dilemmas it produces as museums look towards machine vision as a move to effortless aid in the generation of metadata and descriptive text for their collections. Over the course of several months, we have collected data on how machine vision perceives collection images. This study represents a sustained effort to analyze the performance and accuracy of various machine vision tools (such as Google Cloud Vision, Microsoft Cognitive Services, AWS Rekognition, etc.) at describing images in museum collection databases. In addition to thoroughly assessing the AI-generated outputs, we have shared the results with several prominent curators, and museum digital technology specialists, collecting expert commentary from such museum professionals on the fruits of this research. Now, we strive to share our results. Our study represents over 100 hours worth of time invested in technical analysis, data collection, and interpretation, and we want to share this knowledge to advance the conversation in the museum field. The goal of this paper is to spark a discussion around machine vision in museums and encourage the community to engage with ongoing ethical considerations related to this technology. While machine vision may unlock new potentials for the cultural sector, when it comes to analyzing culturally-sensitive artifacts, it is essential to scrutinize the ways that machine vision can perpetuate biases, conflate non-Western cultures, and generate confusion.

Keywords: artificial intelligence, machine vision, museum collections

Artificial intelligence (AI) is already reshaping all aspects of society, business, and culture. From offering up personalized Netflix recommendations to auto-completing our sentences in Gmail, AI already underlies many routine aspects of our lives in ways we do not even realize.

AI has transformed the commercial sector in myriad ways. While most of us may be familiar with chatbots and predictive engines, it goes far beyond this. From offering contextual marketing messaging, transferring and cross-referencing data, deciding personal injury claims for insurance firms, to enabling financial fraud detection, innovative applications of artificial intelligence technology are popping up everywhere.

Recently, as AI has become more widespread and accessible, museums have begun to make use of this technology. One tool in particular, machine vision, has made a considerable splash in museums in recent years. Machine vision is the ability of computers to understand what they are seeing. Although the application of machine vision to museums is still in its early stages, the results show promise. In this paper, we will explore the strengths and successes of this new technology, as well as the areas of concern and ethical dilemmas it produces as museums look towards machine vision as a move to aid in the generation of metadata and descriptive text for their collections.

In an effort to advance our understanding of machine vision’s potential impacts, over the course of several months, we have collected data on how machine vision perceives collection images. This study represents a sustained effort to analyze the performance and accuracy of various machine vision tools (such as Google Cloud Vision, Microsoft Cognitive Services, and AWS Rekognition) at describing images in museum collection databases. In addition to thoroughly assessing the AI-generated outputs, we have shared the results with several prominent curators and museum digital technology specialists, collecting expert commentary from such museum professionals on the fruits of this research.

Our study represents over 100 hours worth of time invested in technical analysis, data collection, and interpretation, which we hope will help advance the conversation in the museum, art, and cultural heritage field.

In conjunction with strides in digitizing collections, moving towards open access, and linking open data, as well as the growing application of emerging digital tools across the museum sector, machine vision has the potential to accelerate the value created from these important foundational initiatives.

The goal of our exploration of this technology is to spark a discussion around machine vision in museums and encourage the community to engage with ongoing ethical considerations related to this technology. While machine vision may unlock new potentials for the cultural sector, when it comes to analyzing culturally-sensitive artifacts, it is essential to scrutinize the ways that machine vision can perpetuate biases, conflate non-Western cultures, and generate confusion.

What is Machine Vision?

Machine vision is quickly becoming one of the most important applications of artificial intelligence (Cognex, 2019). In the most simple terms, machine vision can be understood as “the eyes of a machine.” According to Forbes, this technology has a variety of applications in business including for “quality control purposes,” and helping businesses in many ways today for “identification, inspection, guidance and more” (Marr, 2019). Machine vision is the underlying technology behind facial recognition, such as that of Facebook face tagging and Apple’s novel methods of unlocking iPhones, Google’s Lens for visual search, and even autonomous vehicles. Like many emerging technologies, the average consumer is likely to interact with machine vision daily and might not even know it.

Even though it may appear that machine vision has only recently emerged on the scene, this technology has been in development since the 1960s (Papert, 1966). Now, almost sixty years later, we are still developing this technology and unlocking exciting new use cases. Every major technology company has leveraged machine vision to advance their own products and services, and they have made their platforms commercially available to enhance the appeal and power of their cloud-computing solutions.

Machine Vision in Museums

Recently, as machine vision and AI have become more widespread and accessible, museums have also begun to make use of this technology. Several museums, including The Metropolitan Museum of Art, the Barnes Foundation, and Harvard Art Museums, have employed machine vision to analyze, categorize, and interpret their collection images. Although the application of machine vision to museums is still in its early stages, the results show promise.

From basic subject detection to complex semantic segmentation aided by deep learning, optical character recognition, and color composition, there are different ways in which machine vision can be used. As accuracy improves and more sophisticated models of machine vision come into play, it will almost certainly change the way museum collection images can be further explored, dissected, and disseminated.

Painting of the Grand Canal in Venice, Italy, featuring numerous boats and grand palaces.
Figure 1: Image of “The Grand Canal in Venice from Palazzo Flangini to Campo San Marcuola,” by Canaletto (Giovanni Antonio Canal), from the collection of: J. Paul Getty Museum


Color-segmented image showing what a computer sees when analyzing the painting described in Figure 1.
Figure 2: Image of machine vision analysis of work depicted in Figure 1

Museums often have thousands of objects logged in their collections, with limited information on them. For collections to become easier to analyze and search, it is essential to collect or generate metadata on these objects.

What is Metadata?

In simple terms, metadata is data that describes data. In television and film, a piece of data might be the name of a movie. For example, To Be or Not to Be. For every piece of data, like the film itself, one can assemble a body of further metadata. In this case, one could describe the movie with information like:

Movie poster for "To Be or Not To Be" depicting the faces of the lead actor and actress. Director: Ernst Lubitsch
Starring: Carole Lombard, Jack Benny
Genre: Comedy
Release Date: March 6, 1942
Language: English
Running Time: 99 minutes
Awards: Academy Award for Best Original Music Score

Figure 3: Image of “To Be or Not to Be” movie poster.

Increasingly sophisticated metadata can be collected and attached to this film, based on the content, theme, and emotive response it may generate. For example, it can also be tagged with “satire,” “war” and “comedy.” That means that when a user logs onto Netflix or Hulu to search for a movie, options can be suggested to fit their interests based on algorithmic analytics paired with the wealth of metadata associated with their past viewing behavior.

This is equally important in the context of museums. According to the Getty, metadata can reflect three different features about objects: content, context, and structure (Baca, 2008). Creating robust datasets that describe museum collections is essential because without them, even open-access collections are limited in their value. Only a robust tagging system that describes various features and contexts of artworks and artifacts enables them to become searchable and discoverable within databases. In other words, metadata can amplify the value of existing data sets, making them usable by researchers, curators, artists, historians, and the public.

There are an increasing number of approaches available to generate and expand metadata within museum collections. Over a decade ago, a project called “” began as a cross-institutional experiment in the world of “social tagging,” also known as folksonomy, for museum objects. Over the course of two years, over 2,000 users generated over 36,000 tags across 1,782 works of art (Leason, 2009). While that initiative is long past gone, the ambition around enriching metadata through external sources remains. Even to this day, museums such as the Philadelphia Museum of Art include user-generated social tags on their object pages. Now, advanced machine vision is emerging as a promising tool to automatically generate discoverable descriptive text around museum collections with close to no limitation on speed or scale.

Generating Metadata and Descriptive Text

Machine vision has become advanced enough to detect the subject matter and objects depicted in any type of visual including paintings, photographs, and sculptures. This can help expand and enrich existing meta tags, as well as fill gaps where meta tags are lacking or completely void. For instance, since many objects in collections might be logged simply as “Untitled” or “Plate” or “Print” even if they contain many significant identifying details. These objects are virtually invisible as they lack any level of sufficient terms to aid in their discoverability. 

Such an object might be of unknown origins, yet contain important images, symbols, carvings, or details. To make such objects discoverable for research purposes, it is essential that they are tagged with information that can offer greater insight and specificity into their visual contents.

Decorated plate with a painted figure of a man with a sword who is riding a horse.
Figure 4: Image of “Plate” from the collection of: Metropolitan Museum of Art

Nothing in this object’s metadata or description makes it searchable via the term “horse,” which is obviously the focal point of the plate. Yet, Microsoft Computer Vision tags this image with the term “horse,” and does so with 98% confidence. This means that the plate can become searchable and discoverable based on its visual elements, rather than just a title that offers little specific information.

Print of two women in formal Japanese clothing.
Figure 5: Image of “Print” in the collection of: Metropolitan Museum of Art

This work is simply titled “Print” and there is no metadata or description associated with it. Microsoft Machine Vision returns object detection that there are two people depicted in the work, with a high level of confidence that they are women. Google Vision is able to get even more specific, tagging this image with the term “geisha.”

This type of precise tagging is key to making objects which are functionally invisible some level of discoverable. Indeed, these examples of objects simply identified as “plate” or “print” are likely representative of millions of objects in far-flung museum collections throughout the world. Digitizing collections and making them “open access” is just the first step in making them more accessible. To unleash the untapped potential of digital collections, and to augment and transform human knowledge of cultural relics, it is necessary to make data searchable. Machine vision is increasingly making this possible.

Over the past few years, museums such as the Museum of Modern Art (MoMA), San Francisco Museum of Modern Art (SFMOMA), Barnes Foundation, Harvard Art Museum, Auckland Art Gallery, and National Museum in Warsaw have made headlines for taking advantage of machine vision for enriching and supplementing their metadata. The early application of this technology in such museums has already shown enormous promise.

Research Results: How Well Does Machine Vision Perform?

From this, it has become clear that machine vision has a number of clear and beneficial use cases when it comes to museum collections. Now, the question arises: just how well does machine vision do? Can it offer accurate tags? Is the metadata generated useful and correct?

Over the past few years, the accuracy of machine vision has improved significantly. According to research by Electronic Frontier Foundation, a group measuring the progress of artificial intelligence, the error rate has fallen from around 30% in 2010 to approximately 4% in 2016, making it on-par with humans.

Line graph showing the rapid decline in vision error rate from 30% in 2010 to under 5% in 2016.
Figure 6: Graph by Electronic Frontier Foundation

In 2016, members of the Cuseum team began to explore the capabilities of machine vision in museums when we first published findings and predicted use cases. Now, years later, we are expanding upon our primary investigation.

Over the course of several months, we collected data on how machine vision perceives collection images. This study represents a sustained effort to analyze the performance and accuracy of various machine vision tools (such as Google Cloud Vision, Microsoft Computer Vision, AWS Rekognition, etc.) at describing images in collection databases at The Metropolitan Museum of Art, Minneapolis Institute of Art, Philadelphia Museum of Art, and the Art Gallery of Ontario

By running a set of digitized collection images from each of these institutions through six major computer vision tools, we were able to assess the accuracy, potential, and limitations of a range of machine vision platforms.

Which machine vision services were evaluated?

Today, there are numerous commercially available machine vision services. Many of these services offer free trials and are more accessible than ever you do not require advanced computer science experience or access to sophisticated hardware in order to use these. It is possible to tap into the power of these services via online interfaces, APIs, and other easy-to-use methods.

We initially selected the following six machine vision solutions based on past familiarity and experience:

  • Google Cloud Vision
  • Microsoft Cognitive Service
  • IBM Watson Visual Recognition  
  • AWS Rekognition
  • Clarifai
  • CloudSight

Recent third-party industry research and evaluation by Forrester Research reinforces our overall selections, with the exception of CloudSight.

Graph comparing the leading machine vision platform which include Google, Microsoft, Clarifai, and Amazon Web Services.
Figure 7: The Forrester New Wave: Computer Vision Platforms, Q4 2019

Note: All machine vision services were served the exact same image files, and were tested in an “as-is,” “untrained” capacity; the services were not fed images and their accurate adjacent data sets in advance.

Examples of Successful Results

One of the greatest merits proven by this study was the ability of machine vision to accurately identify places and people depicted in artworks.

Painting of Piazza San Marco in Venice, Italy with dozens of people walking near the Gothic tower of Saint Mark's Basilica.
Figure 8: Image of “Piazza San Marco” by Canaletto (Giovanni Antonio Canal) from the collection of: Metropolitan Museum of Art

Microsoft Computer Vision was able to successfully identify this painting as “a group of people walking in front of Piazza San Marco” Other machine vision services offered similarly accurate yet less specific tags, such as “building,” “architecture,” “tower,” “urban,” “plaza,” and “city.”

Painting of woman with orange hair, wearing gown, and staring.
Figure 9: Image of “The Marchesa Casati” by Augustus John from the collection of: Art Gallery of Ontario

Microsoft Computer Vision was similarly able to recognize this image as “Luisa Casati looking at the camera.” Other machine vision services tagged this image as “woman in white dress painting” and “retro style” and most understood this image to be a painting of a person.

Classical sculpture of a nude man, missing one arm from the elbow down.
Figure 10: Image of “The Doryphoros” from the collection of: Minneapolis Institute of Art

Machine vision has proved an excellent tool for identifying key pieces of information like style and time period. One prime example is the Doryphoros, a sculpture at the Minneapolis Institute of Art.

Microsoft Computer Vision returned the description “a sculpture of a man.” Services like Clarifai and Google Vision were able to identify this as a “classic” object. The sculpture was overall accurately examined by machine vision with the majority of machine vision services labeled the object with tags such as “art,” “standing,” “male,” “human body,” “sculpture,” “person,” “statue,” “marble,” “nude,” and other accurate terms.

Abstract painting of the sky.
Figure 11: Image of “Upward Trend” by Emily Carr from the collection of: Art Gallery of Ontario

One of the hypothesized limitations of machine vision was its limited capacity to flag and tag more abstract art, which is a more challenging task than identifying photographs, portraiture, or landscapes. All services understood this image was a work of art, a painting, etc. Here, we were pleasantly surprised.

CloudSight was able to identify this as “green and blue abstract painting,” while Google Cloud Vision tagged this as “painting,” “acrylic paint,” “art,” “water,” “watercolor paint,” “visual arts,” “wave,” “modern art,” “landscape,” and “wind wave.”

Painting of "Madonna and Child" with religious icons featured.
Figure 12: Image of “Madonna and Child Enthroned with Saints” from the collection of: The Metropolitan Museum of Art

Machine vision proved remarkably successful at identifying religious, and especially Christian, iconography. Various tools tagged this image accurately, using phrases that included “saint,” “religion,” “Mary,” “church,” “painting,” “God,” “kneeling,” “cross,” “chapel,” “veil,” “cathedral,” “throne,” “aura,” and “apostle.”

Image of the Art Gallery of Ontario, the Toronto View from Grange Park.
Figure 13: Image of “Art Gallery of Ontario: Toronto View from Grange Park” by Edward Burtynsky from the collection of: Art Gallery of Ontario

Within the Art Gallery of Ontario collections, there are many photographs. In general, machine vision tools tend to produce more accurate tags for photos, as opposed to paintings or sculptures, for the simple reason that most algorithms and programs are developed using primarily real photos. 

Microsoft Computer Vision gave a particularly apt description of this photograph, labeling it: “A large brick building with grass in front of a house with Art Gallery of Ontario in the background.”

Examples of Poor Accuracy

Painting of Christ Driving the Money Changers from the Temple.
Figure 14: Image of “Christ Driving the Money Changers from the Temple” by El Greco (Domenikos Theotokopoulous), from the collection of: Minneapolis Institute of Art

While machine vision tools proved quite adept at recognizing and generating accurate metadata on certain kinds of images, in other cases, these tools produced misleading tags.

Many machine vision tools mistook this image as a screensaver. It is possible that the machine vision services are picking up on all the colors and many images typically labeled as “screensaver” in training datasets often based on monetizable use cases and products, which is why this piece may be mistaken for a brightly-colored LCD. Furthermore, this painting contains many complexities and figures, rather than one object of focus. In general, machine vision struggles more to accurately describe such pieces. In these cases, the services will cast a wide net, which can generate many conflicting tags.

In this casting wide net, some of the tags still turn out to be accurate. For example, one machine vision services tagged this image as “Renaissance” and “Baroque.” This work, authored in 1568, indeed straddles the Renaissance and Baroque periods in European History.

Microsoft Computer Vision returned an amusing description many months ago with “a group of stuffed animals sitting on top of a building,” but a recent re-analysis returned an updated description “a group of people standing in front of a building”, demonstrating the possibility of improvements in accuracy over time.

Abstract sculpture resembling a jacket composed of metal.
Figure 15: Image of “Some/One” by Do Ho Suh, from the collection of: Minneapolis Institute of Art

Abstract art is one area where machine vision proves to be less accurate. This piece by Do Ho Suh is an abstract sculpture inspired by his time in the Korean military. It thus resembles a jacket or uniform. Machine vision did manage to generate some accurate tags, including “design,” “fashion,” “decoration,” “chrome,” “sculpture,” “metal,” “silver,” and “art.”

However, machine vision programs also returned completely inaccurate tags. Microsoft Computer Vision offered labels such as “elephant,” “desk,” “mouse,” “cat,” “computer,” “keyboard,” and “apple.”

IBM Watson Visual Recognition generated similarly inaccurate tags, including, “pedestal table,” “candlestick,” “propeller,” “mechanical device,” “ax,” “tool,” “cutlery.”

In cases like these, the machine vision struggles to find anything non-abstract other than the primary material of this object. Many tools were able to understand the object as metallic in composition; however, this resulted in a series of inaccurate associations with common objects of similar materials. Indeed Amazon Rekognition flagged this image as “sink faucet,” Google Vision as “bar stool”, and CloudSight as “gray and black leather handbag.”

Carved stone sculpture depicting a village scene with a mountain.
Figure 16: Image of “Jade Mountain Illustrating the Gathering of Scholars at the Lanting Pavilion,” from collection of: Minneapolis Institute of Art

While not an abstract piece, this sculpture is more monochrome with details difficult to decipher by a computer. Out of the six different machine vision services used, four of them returned terms related to “food,” “cake,” and/or “ice cream.” It is likely that services are matching the picture with ice cream due to similarities in shape and color, while completely missing the small details of trees and houses due to their limited contrast.

Only Google Vision returned remotely satisfactory terms, including “stone carving,” “sculpture,” “carving,” “rock,” “figurine.”

Small metal brazier.
Figure 17: Image of “Brazier of Sultan al-Malik al-Muzaffar Shams al-Din Yusuf ibn ‘Umar,” in the collection of: The Metropolitan Museum of Art

A brazier is fairly uncommon within the image datasets used to train various machine vision services. These image datasets likely have very few braziers but quite a few tables, due to their common occurrence in everyday life and in e-commerce datasets. Additionally, their shapes are similar enough that all of the machine vision tools mistook this object for a table. While humans may quickly understand context and scale, computers do not yet have this ability. Even though this brazier is not a common object we see, it is pretty easy for the human eye to discern that it is a diminutive object (13″ width x 12″ depth x 16″ height), and fall smaller than a table. 

Only Clarifai returned useful terms such as “metalwork,” “art,” “antique,” “gold,” “ornate,” “luxury,” “gilt,” “ancient,” “bronze,” and “wealth.”

All of the other machine vision services returned incorrect tags and terms related to furniture such as “brown wooden wall mount rack,” “furniture,” “settee,” and “a gold clock sitting on top of a table.”

Examples of Problematic Results

While machine vision showed great promise for many works of art, our and others’ research has also illustrated its limitations and biases, particularly gender and cultural biases. With significant efforts being made towards equity, diversity, and inclusion, the chance of an insensitive or potentially offensive tag presents new risks to museums.

Gender Bias

Take, for example, these two portraits by Chuck Close. “Big Self Portrait” depicts the artist with a cigarette in his mouth, and the other, a young woman with a cigarette in her mouth. Both have nearly identical expressions. While the man was tagged by Clarifai using descriptors like “funny” and “crazy,” the woman was tagged by the same tool as “pretty,” “cute,” and “sexy.”

Chuck Close self portrait, depicting Close with glasses and a cigarette.
Figure 18: Image of “Big Self Portrait” by Chuck Close from the collection of: Walker Art Center


Chuck Close portrait, depicting a woman with glasses and a cigarette.
Figure 19: Image of untitled portrait by Chuck Close

In this similar work by Chuck Close, “Frank,” housed at the Minneapolis Institute of Art, Clarifai tagged the work, as “writer” and “scientist,” drawing attention to the fact that men in glasses are often associated with intellectualism. When examining a similar work by Close, “Susan,” depicting a woman in glasses, Clarifai flagged the work as “model,” “smile” “pretty,” “beautiful,” “cute,” and “actor.”

Chuck Close portrait, depicting a man with glasses and curly hair.
Figure 20: Image of “Frank” by Chuck Close from the collection of: Minneapolis Institute of Art


Chuck Close portrait, depicting a woman with aviator glasses.
Figure 21: Image of “Susan” by Chuck Close

Western Cultural Bias

While machine vision tools were able to frequently identify Christian iconography, it left much to be desired when identifying non-Western art, particularly Asian and African art. Take, for example, this terracotta shrine head from Yoruba. This Nigerian work was mistaken as “Buddha” across various machine vision services.

Nigerian sculpture, which depicts a woman's head and neck.
Figure 22: Image of “Shrine Head” from the collection of: Minneapolis Institute of Art

This red-and-blue-laced Suit of Armor from Japan was also mistaken as “Buddha” by machine vision services. This begins to suggest a pattern of conflating non-Western cultures.

Japanese suit of armor.
Figure 23: Image of “red-and-blue-laced Suit of Armor” from the collection of: Minneapolis Institute of Art

Sensitive Topics

Should museums set boundaries as to what types of images they use machine vision to analyze? With complex and highly sensitive topics such as colonization, slavery, genocide, and other forms of oppression, it might be advised that the use of machine vision be avoided altogether.

During the Yale-Smithsonian Partnership’s “Machine Vision for Cultural Heritage & Natural Science Collections” symposium last year, Peter Leonard, Director of Yale’s Digital Humanities Lab, discussed a variety of scenarios where machine vision could go wrong. Leonard ran an image from Sydney, Australia’s Museum of Applied Arts and Sciences through a machine vision service that returned terms such as “Fashion Accessory” and “Jewelry,” when in fact the object was manacles from Australia’s convict history, to which he added, “you can only imagine the valence of this in an American context with African American history” (Leonard, 2019).

Iron manacles lying flat.
Figure 24: Image of Iron Leg Manacles from the collection of: Museum of Applied Arts and Sciences

In the near future it is likely that museums dedicated to non-Western art and culture or those focused on objects of a sensitive nature or relating to historically marginalized communities will steer clear of machine-generated metadata and descriptive text. But, every type of museum should consider these potential risks and plan accordingly.

Potential for success through proper training and hybrid models

At this time, despite success in certain areas, no single machine vision service performs with complete or near complete accuracy.

If a museum wanted to leverage machine vision and increase overall precision, could the answer be as simple as training with large sets of high-quality, human-verified data from the world’s most esteemed cultural institutions? Or, could one aggregate the output and results across a plurality of machine vision tools, and only accept terms of a specific frequency, the threshold of accuracy, and/or human verification via Mechanical Turks, volunteers, or the broader public?

Hybrid approaches will likely emerge that allow museums to easily try a machine vision model with their collection images and adjacent data, as well as leverage human verification and the results of multiple machine vision services.

How have museums begun to use machine vision?

Given the enormous potential of various machine vision services to assist in generating metadata for museum collections, many institutions have already begun to harness these tools in effective as well as creative ways. Some examples include:

Harvard Art Museum
Harvard Art Museum exhibits one of the best displays of machine vision to generate metadata in museums. The museum is using multiple machine vision tools to start tagging the 250,000 works in the collections, with the hope of eventually using “AI-generated descriptions as keywords or search terms for people searching for art on Harvard’s databases” (Yao, 2018).

The Museum of Modern Art (MoMA) partnered with Google Arts and Culture “to comb through over 30,000 exhibition photos” using machine vision. The result was the creation of “a vast network of new links” between MoMA’s exhibition history and the online collection.

Cleveland Museum of Art
Cleveland Museum of Art’s Art Explorer is powered by Microsoft’s Cognitive Search, which uses AI algorithms to enrich the metadata for the artworks.

The Barnes Foundation
The Barnes Foundation has a program that interprets and pairs digital artwork together to recognize art style, objects, and basic elements in an artwork (Jones, 2018). It is a notable step forward in art-historical analysis.

Auckland Art Gallery
One recent example of the merits of AI-generated tags can be seen at the Auckland Art Gallery. This organization has utilized more than 100,000 human-sourced and machine-generated tags to categorize artworks as part of a larger chatbot initiative (Auckland Art Gallery, 2018).

Screenshot of Art Institute of Chicago's online collection section, featuring a color wheel and objects with the selected color.
Figure 25: Art Institute of Chicago introduces new search experience with the Color Wheel

An array of museums including the Art Institute of Chicago, the Cooper-Hewitt, M+, as well as Google Art and Culture and Artsy, have offered new pathways into their collections by leveraging machine-extracted color metadata such as palette, partitions, and histogram data.

Perspectives from the Museum Community

With any new technology, a variety of perspectives and opinions are sure to follow. Artificial intelligence has become a growing topic of interest amongst museum technologists, the art world, and curators alike.

In SFMOMA’s renowned project, Send Me SFMOMA, the museum explored the possibility of leveraging machine-generated tags, but found the terms to be too formal, literal, and uninspired. In an interview with Vox, a representative from the museum remarked that “the intuition and the humanness of the way that our staff has been tagging” is what is interesting, “versus the linearity of the computer vision approach, [which] just makes you miss out on all of the sublime” (Grady, 2017).

Taking a contrary point of view, Jeff Stewart, Director of Digital Infrastructure and Emerging Technology at the Harvard Art Museum believes that machine-extracted terms and vocabularies can sometimes be more human than the lofty or excessively intellectual statements written by an academic or curator. Indeed, a painting in their collection, “Still Life with Watermelon” by Sarah Miriam Peale, was described as “juicy,” “sweet,” and “delicious,” suggesting machine vision’s ability to provide approachable and sensational dynamics to the equation in ways that curators may not.

Painting featuring watermelon, guava, grapes, and other fruits.
Figure 26: Image of “Still Life with Watermelon” by Sarah Miriam Peale from the collection of: Harvard Art Museum

Upon receiving a computer-generated description of Miro’s “Dog Barking at the Moon,” a notable piece from Philadelphia Museum of Art’s collection, Michael Taylor, Chief Curator and Deputy Director of Virginia Museum of Fine Arts, shared his positive and amused reaction.

Abstract painting of a dog and ladder.
Figure 27: Image of “Dog Barking at the Moon,” by Joan Miró from the collection of: Philadelphia Museum of Art

The machine-generated description was “white, blue and brown two legged animated animal near ladder illustration” which is fairly accurate to the literal depiction but doesn’t capture the conceptual, abstract nuances, or well-studied interpretation of the work only a curator can provide. Compare that to Taylor’s description of the same work:

“At once engaging and perplexing, Joan Miró’s Dog Barking at the Moon exemplifies the Spanish artist’s sophisticated blend of pictorial wit and abstraction. Like many of the works that the artist painted in Paris, this work registers Miró’s memories of his native Catalonian landscape, which remained the emotional center and source of his imagery for much of his life.  The work’s genesis can be found in a preparatory sketch showing the moon rejecting a dog’s plaintive yelps, saying in Catalan, “You know, I don’t give a damn.” Although these words were excluded in the finished painting, their meaning is conveyed through the vacant space between the few pictorial elements that compose this stark, yet whimsical image of frustrated longing and nocturnal isolation. Against the simple background of the brown earth and black night sky, the artist has painted a colorful dog, moon, and a ladder that stretches across the meandering horizon line and recedes into the sky, perhaps suggesting the dream of escape. This remarkable combination of earthiness, mysticism, and humor marks Miró’s successful merging of international artistic preoccupations with an emphatically regional outlook to arrive at his distinctively personal and deeply poetic sensibility.”

One could surmise and easily agree that artificial intelligence and computer vision are unable to deliver a comparable description, rich with art historical context and deep interpretation – at least today.

Machine Vision Bias: The Ethical Considerations

While machine vision may unlock new potentials for the cultural sector, it is essential to scrutinize the ways that machine vision can perpetuate biases, conflate non-Western cultures, and generate confusion.

Screenshot of Google Arts and Culture "Art Selfie" app that shows its ethnic bias.
Figure 28: Image of Google Art Selfie featured in TechCrunch

Two recent projects amplified the topics of AI bias within the art and cultural world. Google’s Art Selfie project, an app that matched a user’s face with a similar piece of art, was easily one of the most viral phenomenons of 2018, yet it faced criticism from people of color due to limited results which exemplified racial stereotypes or otherwise produced inappropriate and offensive “lookalikes.”

In 2019, an art project by researcher Kate Crawford and artist Trevor Paglen called “Training Humans” sparked a dialogue about the problematic bias of facial recognition software. The project led ImageNet, one of the leading image databases used to train the machine vision model, to remove more than half a million images.

Advancements in AI technology will help but not necessarily solve these potential issues. In fact, one prediction by technologist Roman Yampolskiy published in the Harvard Business Review warned that “the frequency and seriousness of AI failures will steadily increase as AIs become more capable.”

That being said, there is an upside. According to the New York Times, “biased algorithms are easier to fix than biased people” (Cook, 2019). Whereas it can prove difficult to “reprogram” our hearts and minds, software can be updated when biases are uncovered. This suggests that discrimination and bias in AI can be detected and remedied helping us overcome some of its biggest challenges.


Our broad research and experience suggest that technologies like machine vision are getting better, faster, and more accessible and that AI’s bias is wholly recognized and being addressed. In the years to come, we can anticipate an increase in the use of machine vision in museum collections, as well as the heightened accuracy and decreased bias of machine vision tools. This has the potential to make collections discoverable in a new way, unlocking the full value of digitization initiative by creating a body of metadata that will make collections exponentially more searchable.

According to Hans Ulrich Obrist, renowned curator and co-director of The Serpentine Galleries, “we need new experiments in art and technology” (Selvin, 2018). Machine vision just may prove one of the key tools that will advance the museum field.

It is equally important to consider the potential of AI and machine vision to generate new insights that are different or go beyond what a human eye or mind might generate. According to technologist and AI expert Amir Husain in his book The Sentient Machine, Too often, we frame our discussion of AI around its anthropological characteristics: How much does it resemble us?” Further adding “Do we really imagine that human intelligence is the only kind of intelligence worth imitating? Is mimicry really the ultimate goal? Machines have much to teach us about ‘thoughts’ that have nothing to do with human thought” (Husain, 2017).

This introduces the idea that the end goal of harnessing machine vision in museums may not even be all about mirroring what a human, curator, educator, or scholar could do. Machine vision opens up doors for a new kind of analysis and introduces a different type of interpretation and understanding of a work that may or may not reflect the cognitive limitations or learning frameworks of the human mind.

In the next decade, the computing power and abilities of machine vision will be great multiples more significant than it is today. Today we are just at the beginning. If prioritized, budding partnerships between museums and technology companies will help build models and algorithms specifically for cultural and artistic use and help alleviate some of the obstacles holding us back today.

In anticipation of this, now is the time to act and steer the future of collections. In order to ensure the accurate and ethical application of machine vision to museums, standards and policies that will guide how we employ this technology must be set and adhered to. By entering into a thoughtful dialogue, we can reshape the practice of collections management and enable the discovery and analysis of objects and cultures on a new level. To maximize the position of museums in this rapidly changing landscape, there is no better time to discuss, challenge, and explore the value of new technology.

Yes, museums should proceed with caution and thoughtful consideration, but we should not act out of fear. The more organizations and key stakeholders are involved in this exploration, the greater the value that will be created and shared across the field, and our contributions to making cultural heritage accessible to as many people as possible. We are still in the preliminary and experimental stages of this journey and are optimistic for what lies ahead. The future of museums will be many things – and we will require both vision and machines to get there.


“Art+Tech Summit at Christie’s: The A.I. Revolution.” Art+Tech Summit at Christie’s 2019. Consulted January 2020.

Auckland Art Gallery (2018). “Auckland Art Gallery’s new chatbot demonstrates artificial intelligence to give new access to 17,000 artworks.” Consulted January 2020.

Baca, M. (2008). “Introduction to Metadata.” The Getty Research Institute. Consulted January 2020.

Bailey, J. (2019). “Solving Art’s Data Problem – Part One, Museums.” Art Nome. Consulted January 2020.

Cates, M. (2019). “The Met, Microsoft, and MIT Explore the Impact of Artificial Intelligence on How Global Audiences Connect with Art.” MIT Open Learning. Consulted January 2020.

CapTech Consulting (2017). “Accuracy of Six Leading Image Recognition Technologies Assessed by New CapTech Study.” CapTech. Consulted January 2020.

Ciecko, B.(2017). “Examining The Impact Of Artificial Intelligence In Museums.” Museums and the Web 2017. Consulted January 2020.

Cognex (2019). “What is Machine Vision.” Consulted January 2020.

Cook, T. (2019). “Biased Algorithms are Easier to Fix than Biased People.” New York Times. Consulted January 2020.

Cooper Hewitt (2013). “All your color are belong to Giv.” Consulted January 2020.

Engel, C. & Mangiafico P. & Issavi, J. & Lukas, D. (2019). “Computer vision and image recognition in archaeology.” Proceedings of the Conference on Artificial Intelligence for Data Discovery and Reuse 2019. Consulted January 2020.

Fenstermaker, W. (2019). “How Artificial Intelligence Can Change the Way We Explore Visual Collections.” Met Museum Blog. Consulted September 2019.

Grady, C. (2017). “How the SFMOMA’s artbot responds to text message requests with personally curated art.” Vox. Consulted January 2020.

Harvard Business Review. (2019). “Artificial Intelligence: The Insights You Need from Harvard Business Review.” Cambridge: Harvard Business School Press.

Hao, K. (2019). “This is how AI bias really happens—and why it’s so hard to fix.” Technology Review. Consulted January 2020.

Husain, A. (2017). The Sentient Machine. New York: Scribner.

Jones, B. (2018). “Computers saw Jesus, graffiti, and selfies in this art, and critics were floored.” Digital Trends. Consulted January 2020.

Kessler, M. (2019). “The Met x Microsoft x MIT: A Closer Look at the Collaboration.” The Met Blog. Consulted January 2020.

Knott, J. (2017). “Using AI to analyze collections.” Museum Association. Consulted January 2020.

Leason, T. and, Steve (2009). “The Art Museum Social Tagging Project: A Report on the Tag Contributor Experience.” In J. Trant and D. Bearman (eds). Museums and the Web 2009: Proceedings. Toronto: Archives & Museum Informatics. Consulted January 2020.

Leonard, P. (2019). “The Yale-Smithsonian Partnership presents: Machine Vision for Cultural Heritage & Natural Science Collections.” Consulted January 2020.

Marr, B. (2019). “What is Machine Vision And How Is It Used In Business Today?” Forbes. Consulted January 2020.

McAfee, A., & Brynjolfsson, E. (2018). Machine, platform, crowd: harnessing our digital revolution. New York: W.W. Norton et Company.

Merrit, E. (2017). “Artificial Intelligence The Rise Of The Intelligent Machine.” AAM Center for the Future of Museums Blog. Consulted January 2020.

Merritt, E. (2018). “Exploring the Explosion of Museum AI.” American Alliance of Museums. Consulted January 2020.

Moriarty, A. (2019). “A Crisis of Capacity: How can Museums use Machine Learning, the Gig Economy and the Power of the Crowd to Tackle Our Backlogs.” Museum and the Web 2019. Consulted January 2020.

Ngo, T. and Tsang, W. (2017). “Classify Art using TensorFlow.” IBM. Consulted January 2020.

Nunez, M. (2018). “The Google Arts and Culture app has a race problem.” Mashable. Consulted January 2020.

Papert, S. (1966). “The Summer Vision Project.” Consulted January 2020.

Rao, N. (2019). “How ImageNet Roulette, an Art Project That Went Viral by Exposing Facial Recognition’s Biases, Is Changing People’s Minds About AI.” ArtNet. Consulted January 2020.

Rao, S. (2019). “Illuminating Colonization Through Augmented Reality.” Museum and the Web 2019. Consulted January 2020.

Robinson, S. (2017). “When art meets big data: Analyzing 200,000 items from The Met collection in BigQuery.” Consulted January 2020.

Ruiz, C. (2019). “Leading online database to remove 600,000 images after art project reveals its racist bias.” The Art Newspaper. Consulted January 2020.

Schneider, T. (2019). “The Gray Market: How the Met’s Artificial Intelligence Initiative Masks the Technology’s Larger Threats.” ArtNet. Consulted January 2020.

Selvin, C. (2018). “‘Curating Involves a Daily Protest Against Forgetting’: Hans Ulrich Obrist Waxes Poetic at Armory Show.” ArtNews. Consulted January 2020.

Shu, C. (2018). “Why inclusion in the Google Arts & Culture selfie feature matters.” Tech Crunch. Consulted January 2020.

Smith, R. (2017). “How Artificial Intelligence Could Revolutionize Archival Museum Research.” Smithsonian Magazine. Consulted January 2020.

Summers, K. (2019). “Magical machinery: what AI can do for museums.” American Alliance of Museums. Consulted January 2020. Consulted January 2020.

Swant, M. (2018). “How the Cooper Hewitt Museum and R/GA Are Showing the Evolution of Technology and Design.” AdWeek. Consulted January 2020.

Trivedi, N. (2019). “The Color of Serendipity: Searching with the Color Wheel.” Art Institute of Chicago. Consulted January 2020.

Winsor, R. (2016). “Clarifai vs Google Vision: Two Visual Recognition APIs Compared.” DAM News. Consulted January 2020.

Yao, S. (2018). “A Probe into How Machines See Art.” The Crimson. Consulted January 2020.

Cite as:
Ciecko, Brendan. "AI Sees What? The Good, the Bad, and the Ugly of Machine Vision for Museum Collections." MW20: MW 2020. Published January 15, 2020. Consulted .