Josh Gorman – Oral Histories of Museum Computing

Oral History of Museum Computing: Josh Gorman

This Oral History of Museum Computing is provided by Josh Gorman, and was recorded on the 28th of May, 2021, by Paul Marty and Kathy Jones. It is shared under a Creative Commons Attribution 4.0 International license (CC-BY), which allows for unrestricted reuse provided that appropriate credit is given to the original source. For the recording of this oral history, please see https://youtu.be/utZlksfCQxo.

I’m Josh Gorman. I am the Chief Registrar and the Head of Collections Management Services at the Smithsonian National Museum of American History. I came on board in March of 2016, first as Registrar, then took on the collections management roles, and have been sort of getting my hand into everything ever since.

So, I wanted to talk about a project that we completed. It was – oh, it’s two years ago now — in December of 2019, we pivoted our collections data, where we pushed everything out to the public. We adopted a comprehensive policy of having Open Data for all of our collections information, obviously not all of our collections information, but at least a representation of everything that we have in our collections information system, collections management system, we moved online. And, and really this was a huge project that ended up taking about two years, and sort of massive interventions with policy and procedure, and you know, authorities and, and respect, and the courtesies that sort of the ways in which all of the different individual museums interact with collections, and it all sort of rose out of this really simple question that had arisen very early in 2016 when I started, which was, “why aren’t we sharing more of our pictures online?”

And all of this really got started, and I think, I mean, the apocryphal story is in 2011 or so — yeah must have been ’11 — the Secretary of the Smithsonian Wayne Clough, was went on FOX News, and you know, was talking about some exhibit or something and out of nowhere, the anchor asked, “Well, what are you doing online? Are you going to digitize your collections?” And he said, “Oh yeah. We’re going to digitize everything.” And, and then sort of came out of the interview and, and then asked somebody, “Well, what does that mean?” Because there was really very little information about what it is we had – digitally, physically, anything.

You know, Günter Waibel told this story, because he, he sort of would tell that story, sort of remembering and saying, “That’s why I want to work at the Smithsonian.” You know, a leader who wants to do that, and, of course, that ended up being very, very complicated, but this notion of sharing it all required us first to understand everything that we had, which… this was coming in the, the aftermath of a report in the mid-2000s that was called “Concern at the Core” in which the Inspector General more or less put the entire collections community at the Smithsonian on notice that, “You don’t know what you have. You don’t know where it’s at. You’re not taking care of it very well. And you’re not planning well for the future either, with respect to the growth of the collections, the housing of the collections, or the staffing for those who care for it.”

And so a collections advisory committee was put across, in place, across the Smithsonian. And one of the things that they finally rolled out in about 2012 was a Collections Digitization Reporting System, which was more or less a spreadsheet at first. It’s now a, you know, an online database of sorts, an access table that, through which all of the museums could report how many objects they have, how many loans they have, you know, how many of those objects are described, how many of them have images taken of them, which meant in 2013 or so, for the first time, there was this real accounting of everything that was at the Smithsonian. First time since maybe, well 1981, but there was a second survey in ‘89, but it was really synthetic. But, it was sort of this first time where we knew what we had, and it was in the context of these digital images that was included in the counting, and what it revealed was that we had hundreds of thousands and millions of digital images of our collections across all of the museums that… and very, very few of them were available.

So when I came to the American History Museum in 2016, I’d come from the Anacostia Museum, where I’ve done a lot of work really pushing everything online. It was a very small collection, you know, fewer than 5,000 objects. I mean, it was the kind of thing where two colleagues and I like literally did a physical inventory of everything in the space of four weeks. We just went in and just put hands and eyes on everything. And so it was it was a much smaller enterprise than the American History Museum, which has, you know, we’re now calling it 1.8 million objects, but it’s plausibly as many as 3 million objects, when you start counting numismatic collections and all sorts of things.

So coming to the American History Museum, I was really interested in trying to do more, because when I showed up, we were at the time, sharing about 23- or 24,000 of our objects online. And this was through the collections.si.edu, our enterprise, no… yeah… our enterprise digital asset network. We were sending everything out, all of the museums were contributing some metadata and collections to this common portal for Smithsonian collections, and we were sharing, you know, not even one percent of collections in this space, and, and we were trying to do more.

At that point, we had collections records in our database for over a million objects. More than… I’d say 600,000 of them as good data, you know, at least tombstone data that we could share with the world, but we weren’t doing it. And this had to do with sort of the authorities for how we were sharing things. Everything was… if it was going to go online, it had to be specifically requested by a curator to go online, it had, based on the curatorial division structure, there was varying levels of description required by each division that had to go, be reviewed before it could be nominated to go online, and then my office, the Registrar’s office, had to make sure that we owned the object. That we have legal title to the item that… there were no restrictions, that was where we did whatever a little bit of intellectual property review we were doing at the time. And then, it would get pushed forward.

And so we started unpacking different ways that we could share more quickly. The first thing we did was we just decided, “Okay, we’re going to share everything that has been through what we were calling our legal title process.” Beginning in 2011, we had started a series of collections inventories, responding to another Inspector General sort of report saying that we needed to do better with accounting for our collections. And part of those inventories was minimum cataloging, it was imagery for the collections, and then it was authority checking for these materials. Making sure that we could take the object we found on the shelf, match it with the acquisitions file, the accession file that we had, that we actually knew what the object was and that we owned it and, where possible, we could identify some provenance and capacity to rightfully own those materials. So we determined that Okay, we have achieved a minimum viable product at least, with these objects because we’ve described while we photographed them, we know we own them, so we, we have zero risk sharing these with the world. So we put them up and started rolling that out, and of course, it was still a very manual process, so it didn’t… it wasn’t like you could flip a switch and then send them all up, so started rolling that through. And really that started coming together by… I want to say about a year later. We had increased the number of assets we were sharing to about 250,000, so increased by a factor of 10 what we were sharing online, and we were on track to double that by the end of the year.

But then, we were going to start hitting a threshold. And about this same time we had a new chief curator come in, Katie Eagleton. She came from the British Library. She had previously been at the British Museum… and she’s now the head of libraries and museums at St. Andrews. She’s a brilliant curator and collections and museum professional, who came in and started looking at all this, and was excited that we were doing more, because she had led a lot of this work with some numismatics collections at the British Museum. And so she started, like, I think, for the first time was one of the first museum leaders to really start picking through our collections data that we’re putting together every year for the CDRS [Collections Digitization Reporting System — the annual reporting system for collections data and digital assets at SI] that responded to the Secretary telling FOX News we were going to do it all.

And she saw something — which was that we had about at that point it was 890,000 images of collections, and we were only sharing at that point, maybe 80,000 of those images online — and really recognized that, “Okay it’s great that we’re sharing our collections data, but people want pictures. They want to see this stuff.” And, and just asked a very simple question: “How long is it going to take to get to a million images? And how long until we can share them all online?” Just sort of this basic access parameter and, and that was a really difficult question to answer because we… one, we didn’t know… of those, we’ll just call it a million images, how many objects that reflected? You know, was it 100,000? Was it 700,000? It almost certainly wasn’t a million.

Because we hadn’t migrated everything into our digital asset management system yet, we, we didn’t have a good understanding of how those related to objects. We have, we still to this day, have very poor reporting out of our DAMs, so we, we don’t know, you know, how many duplicates we’re necessarily looking at in some cases, I mean… although we’ve gotten a lot better on that. I’m speaking a little out of turn. I don’t do much of our digital assets work… But it was really, what it came down to was we, we had all these images and….

[…]

The only way to get everything online was to put all of our collections records online because that was the pathway. Now, we weren’t just going to put images up without context. It had to be tied to the collections information system. It had to carry the metadata. So, that launched the question of like, “Okay, well how do we put all of our, our collections information online?” And, and that launched a series of like I said, really difficult questions, and a series of processes, and they’ve taken about two years to come to fruition, to move all of our 1.8 million objects online, to adopt a policy of open data for all of our collections data, and to adopt a really progressive risk model that evaluated… really looked at what our risk was for doing all this, and tried to capture only the most high-risk scenarios for both prospective control and reactive filtering. So, that’s where we ended. I’m going to pause for a second just because it feels really weird to keep talking for, for so long.

[Marty]: Oh, this is absolutely great stuff. Absolutely great stuff. I love this, this issue of, you know, trying to figure out the relationship between the objects and the images, and, and what’s driving these initiatives, as you say. And what does the Secretary say? What does the chief curator say? You see how the political situation shapes the work of the museum professional…

Absolutely, and it was really, that was… those questions and that drive to… I mean, I don’t want to sound belittling, but really unsophisticated understandings of impact. I mean, we, we were just kind of guessing in a lot of ways. We still are in a lot of ways. Sort of, what is the value of the image of an object online? What is the value of data online? What was the value of the data in the collections information system? You know, it certainly has value as a collections management system, but what it… you know, collections data, it became very clear early on that for generations, I mean, we, we have collections data that goes back to some of our first systems, which were created in 1968. None of that data about our collections was created with the notion that anyone outside the museum would ever use it. It just… this, this notion that the, the information would be deployed and leveraged by, by users in any fashion outside of a very narrow group of collections managers, registrars, and curators. There’s just no notion that anyone would ever want to see this, [that] anyone ever would, which severely complicated the ways in which it was entered, what we had to do to prepare that data to go out, especially with respect to some specific risks around privacy and different security questions, and then the whole notions of reputational risk. You know, the, the idea that the collections data may be sloppy. It may be incorrect. It may just be embarrassing in some places. So, it… this idea of “Okay, we have a million pictures. People love to see pictures of museum collections.” We’re a visual space — that’s how they want to interact — resulted in us having to spend a long time pulling apart the threads of these other really complicated issues in order to make that happen.

[…]

So, what this really meant was that we had to pick apart all of these things. Okay, what is our risk when we put things online? So what, what legal risks do we have? Because you know, we at that point, didn’t know how many accession lots we even had. I mean, our, our management of the accession file was still exclusively paper-based. We were iteratively creating digital records for each accession and trying to capture donor information and that provenance information as we, as we, as a byproduct of the inventory process.

So, you find an object on the shelf. You go to a physical catalog record to find out what accession lot it came out of, and then you go pull the file out of an electriever [a giant mechanical high density file cabinet] in the file room, and this… this meant that we just we didn’t know what we had. So it meant that we had a bunch of known unknowns. So, the known unknown was what was ours. Based on some surveying, and actually, a poor contractor, over the course of 18 months, going through every file, we pulled everything that looked like a loan, and we knew that we had some 6,000 loans — like folders, loan folders — each of them, I think the average we determined was like 12 objects per loan, that had been assigned catalog numbers in sequence with everything else, and so, they were sitting on shelves, up with the rest of the collection. So we had at that point in our collection of 1.8 million, between 30 and 45,000 objects that didn’t belong to us.

So if we’re going to share everything, and we have no way of unpacking what’s ours and what’s not, at the front end, what’s our risk? You know, is there harm done by putting something online? Is that necessarily an assertion of ownership? So trying to unpack that. Obviously, privacy data is, is huge for so much of what we have. I mean, besides the fact that we literally have collected Social Security cards and passports and credit cards and various other documents, and sometimes transcribed those materials in object records. There are, you know, what constitutes private information is, is really… I mean, the phone book is, is a ridiculous violation of privacy law. It’s the notion that we just we couldn’t share anything that had names on it, and any other information or we had to understand what our risk there was. Because, unlike intellectual property liability, somebody whose private information has been shared doesn’t have to demonstrate harm. They just have to demonstrate that it has been shared, and then a penalty can be applied. So, we had to be very, very careful about that, and then intellectual property. We, we have a very broad collection that has lots of things that is very… that we very clearly do not own intellectual property for, you know, tons of entertainment and cultural collections and artistic collections that we know we can’t, you know, completely openly share some form of that information. But what can we share, what can’t we…. because we’re developing this in a moment, where there is really considerable movement in the field, from standards that had emerged in the early 2000s to the, the release of the College Art Association best practices and fair use in the arts, so this, this notion of we could start asserting a much more aggressive fair use stance.

And that’s for the stuff we know about. So much of what we have in our collection kind of falls in a “I know it when I see it” category, you know, like a quilt. We have 8,000 quilts, and otherwise pieced together textiles that whether or not it constitutes an art object, or is a utilitarian object, so something with intellectual property or without intellectual property, is entirely up to the eye of the beholder. And there are some markers that you can pick apart, but that requires an object review that we were just not equipped to, to handle.

And so what, how can we start pulling all these together? And then we had the courtesies of traditions of ownership and curatorial authority. We were very rapidly moving past a position where curators had literally been the sole arbiters of every decision about what went in front of the public. And we were very quickly dismantling that. And, in fact, saying that, “No, the decision about what goes before the public will not be made at the point of putting it in front of the public, but had been made at the point of acquisition. By putting something into a public collection, that became the moment at which something enters the public realm.” And so, we’re just moving it forward that way. So, we were disrupting that authority. We were disrupting a process in a lot of ways whereby — this doesn’t exist across the field, at least I hope not — but for our institution especially, a very… this is a strongly held belief in some corners, that the catalog was an authoritative document that could be completed, not an iterative description of an item. So we were really hitting some of the, the notions of what constitutes the expression of knowledge about material culture, or the expressions of knowledge and, and what not. So, it got post-structuralist pretty quickly.

And, and so, trying to unpack all of this. Trying to think, Okay, how do we, how do we… who do we have to talk to to make this happen? What are the decisions that we have to make? And, and so it was really about doing some early policy and procedural work. First, coming up with the workflows that identified, Okay, how do we get to our end goal? What will those outputs look like? What are the steps that, like the actual, and I say “physical” steps, even though this is all digital, but like what, what does somebody have to sit at a computer and do, to make this happen? Then, what are the decisions that have to be made to enable that, either so that it works for policy or that it doesn’t… if it does run up against these courtesies and these traditions of authority, they are appropriately authorized? Who are the responsible individuals for making these decisions? And how do we provide the responsible individuals with the information they need to do this?

So, in an environment where the Director assumes all of the, you know, ultimately, the authority for making these decisions, and certainly the responsibility and, and faces the consequences for those decisions, our Directors work within an infrastructure, where they’re reporting to an undersecretary or provost, and relying on the advice of a national collections program, and an office of general counsel, who may or may not agree with some of the stances that we’re taking to push things forward. And, and also sit in sort of a room with their peers. You know, our Director regularly sits with her 18 colleagues from the 19 museums and research institutes of the Smithsonian Institution, many of whom have taken a much more conservative stance towards sharing their collections data and images with the world. So, what does, you know, if we take this stance, what does that mean for the rest of the institution? And of course, we, we were began very quickly to get some great cover from the Cooper Hewitt Museum, who, with sort of Seb Chan driving things, really just pushed the envelope for the field, and, and certainly changed the conversation within the Smithsonian, what, what we could think about what we could do.

So, really then, it was about trying to piece these things together, to, to do our risk analysis for every possible scenario of harm that could come from sharing both the data and the images. Trying to get the buy-in of the folks who are going to have to do the work. Many of our collections managers and, and data folks who might justifiably see this as an unfunded mandate. A lot more work that they’re going to have to do to push things forward. And then, our curators, who, who saw themselves as caretakers of collections that had been handed down to them for generations, that they had, many of them, had spent their careers, you know 30, sometimes 40 or more years, building these collections, and who had intense dedication to their use, a use that they didn’t imagine happening in this way. So trying to, to find influential, knowledgeable, honestly hypercritical partners to involve in the process, to, to bring it together and then start building all of this. And so, that’s what we began in… that was the end of the summer 2017, and for two years, we convened a group, sort of building these processes. I’m going to pause again.

[Marty]: I’ll jump in with a, with a quick question, right. I mean, we’re talking… you’ve got, what, 1.8 million objects, you said if you don’t count the numismatic collection? There’s a lot of tension going on here in these conversations, I can imagine. I’m thinking about curators who probably have, you know, a small bit of the collection that they are focused on, and that’s very different from saying, “Hey, let’s get inventory records for more than a million things.” Those are two conflicting goals, right?

Oh absolutely. I mean, we were…. The museum is… its organizational structure is still very old fashioned. I mean, we can still draw the direct lines from our… presently, we have four curatorial divisions that represent 27 named collections that have histories, going back to various points in our institution’s history, back to the very founding in 1846. I mean, we have collections that were literally founded 150 years ago. And have had, you know, 13 curators of printing graphic arts. I mean, it’s, it’s kind of absurd, but it does carry some of that monumentality, and yeah, the, the notion that I am trying to build a collection to be used for exhibition, while simultaneously advancing knowledge which, in the culture of the American History Museum was really about material culture research and publication, in something resembling a university model. That was completely at odds with what has emerged in the last 30 years as… we’ll call it professional collections management, which focuses on some very clear and increasingly sophisticated technologies of access and care that are really, just in many ways, separate from the curatorial impulse. And of course, we were also trying to then wrap in — in a way that is now much more common for us, but, at the time, was completely foreign — participation by our audience engagement folks, our educators, our folks who are doing programming and outreach, and, and even fundraisers.

You know, we had a number of, of educators, you know, folks who are doing public-facing work on our task force to build out data share. And that in itself was controversial. This notion that well, what do they have to say about, you know, our collections data when, when really, we were trying to not just share everything, because no one was… none of us well, maybe some of us were, but.. those of us who were leading the process, we had, were under no illusions that simply putting this information online meant that anybody would look at it, anyone would use it. We understood that this was just the bare minimum, first step to building the opportunity for use by a wider world. And we wanted to make sure that we were doing all we could in those first steps to, to make materials available and useful, and, and be able to, to leverage them and to have some impact beyond the worlds that we knew, in museums and archives and in libraries, and then sort of the pathways that we knew there, but understanding, okay, how are educators going to access this content? And all that. So, this was all part of the conversation.

So yeah, it was absolutely at odds, and you know, I started saying that I feel somewhat outside of the museum sort of computing world right now. I mean, I’m head of collections management, and I feel completely — I don’t know — distanced from the world of collections. I mean, I don’t remember the last time I held an object. I look at spreadsheets all day. This is all about reporting and policy and what not.

So, that, that abstraction from you know, the, the curator who has spent a career caring for a collection of 12,000 ceramic plates and glass dishes, and me, who is trying to care for and provide for the stewardship of 1.8 million plus objects. Yeah, it’s, it’s orders of magnitude between us and, and so, finding a way to do one without restricting the other, and not only not restricting the other, but providing new pathways for discovery and utility for our curators was essential. So, we spent a lot of time with this, talking, couching the changes we needed to make to push to the world, in terms that would be useful internally.

So, just, you know, something as basic as getting to standards for naming, and would enable better searching so that somebody who wants to collect something can first see if we already have one. And, and that happens all the time, where even now what we’ve done so much better than we were five years ago, we are still collecting things that we already had, just because discovery is really difficult, and descriptive practice has been suboptimal. So, trying to couch those changes, frame those changes we were making for access in ways that they would see benefit to, to staff currently was important for us.

Really, the process was extraordinarily important for us, making sure that we went slow enough to bring a critical mass with us. That we weren’t really just in… we had, we had buy-in on this project to the… throughout leadership and, and into the castle at Smithsonian. There was a lot of desire for us to move more quickly on this. But, we chose not to in order to make sure that we were trying to create a sustainable model for doing this for all of our staff, so that we had as few holdouts as possible, recognizing that we needed everybody on board to do a lot of it, but also recognizing that any individual could undo good work as we moved forward. That actually colored a lot of, you know, some of the creation of our task force. We, we had some of our most critical staff members, or staff members who are most critical of the project, actually, on there, which went a long way in convincing skeptics across the museum that we are really listening to, to your concerns, and, and trying to address them. So that was part of a lot of what we did.

[Jones]: I want to jump in now because I’m still kind of… I don’t know in awe of the legal and political ramifications of everything that you do. And the fact that you said that you took it slowly, and you were considering the risk, you know, and I’m also thinking, yes, you know we see the Smithsonian as America’s museum, but for you to say that you’re collecting Social Security cards historically, and all of that, it just puts a whole new perspective on what your job, Registrar, Collections Management, might mean. So I’m, I’m in awe, Okay. I’m a fan. I used to be a Registrar. I know how hard it can be, but this takes it beyond anything I ever had.

Thank you, I think, so Paul gives me the great opportunity to speak with his students usually once a year, and I think it, every year, it comes up that someone asks, “Well, what should I be studying to do this?” I’m like, well, honestly, a law degree would be most useful.

[Jones]: Apparently!

Not at all practical. God, can you imagine, requiring that our entry level colleagues have a law degree, but, yeah it was, it was really all about understanding, Okay, what is the legal risk? And we, we broke down our collections into loans, and objects with catalog numbers, and objects without catalog numbers, you know, deaccessioned materials, things that we know, we believe we have, but we can’t necessarily put our hands on it right now, objects without locations – I won’t call it missing, we just we don’t know exactly where it is. You know what are, what are the legal risks of sharing this? Repatriated materials — what are the ethical concerns for this? I mean goodness, we… yeah, we have everything. What are the, the ethics of sharing those Social Security cards? And, and what are the ways that we can do that?

Reputational risk was something that we, we applied a designation of reputational risk very liberally, which is to say, we tried to imagine all the ways that people might think this is bad for all of these different categories. And, in fact, I think all but two of the categories of collections that we evaluated, the deaccessioned, disposed materials and the recently acquired collections, we believed might carry a reputational risk, either because we had not cared for them appropriately, or we couldn’t necessarily describe why we had them, or justify their presence in our collection readily. All of these other, other reasons.

And, and that we spent… that was the thing we spent the most time talking with leadership about. And, and we spent that time thinking that it was going to be the big problem, because it was the thing that all of our colleagues in the department of history, all of our curators were really obsessed about. When it came down to it, leadership didn’t care. And, or rather, from their perspective, the benefit from sharing and becoming more open, as a matter of policy, was so much greater than any risk we might bear by showing that we hadn’t cared for things appropriately, that we had to do it anyway. This notion, the, the fears that we had that someone might discover that something is stored in a building that we haven’t been able to access for 20 years because it’s contaminated with asbestos. Well, honestly, that’s a matter of record anyway, so it’s better to do something positive with this information rather than try to hide behind this embarrassment and shame of not being on top of our collection. So that was that was something we spent a lot of time with that maybe we could have set aside. We could have handled that much more quickly, but I think that was part of the real working through the process slowly, with all of our colleagues.

And then we also tried to unpack the benefits of sharing, tried to say, Okay, baseline, we can’t do anything unless we do this. We have to put all of our collections online if we’re going to share our images, but then we started trying to pick apart some of the really thorny questions about benefit and impact and, and how do we achieve and measure impact, and, and can we even begin doing that? And in the end, most of the work we did around the benefits of sharing ended up being… theoretical — for lack of a better term — to just say here are the ways that we imagine there could be benefit, but we are not yet prepared to investigate that, to unpack that, to really try to uncover or measure those benefits because it’s… we just we don’t have the infrastructure for doing that right now. The investment for doing so would, would preclude us doing the sharing in the first place. So we’ve set those aside. We’re actually working through some of that right now. So that’s kind of how that process unpacked or unrolled.

[Marty]: To me, it seems like it’s yet another thing to measure on top of a long list of things you’re trying to quantify right, I mean, 1.8 million records — do you have believable inventory records for them? Do you know where they are? Do you have anything more than a tombstone data entry, right? Do you have records? And now you’re asking about impact on top of that?

Yeah, so setting impact aside, and honestly we ended up setting aside the quality and quantity of data. We have become familiar… well, we decided to share everything that we could positively say had been collected, we believe, is in our possession, and has some minimum cataloguing sort of capacity — we could… it has a number on it. And I think, maybe has a title or name, but we do have tens of thousands, maybe approaching hundreds of thousands of objects that are really quite useless in the world. And, which that — that — actually complicates our impact more than anything, because it kills our SEO, and, and so, trying, I mean, that’s, that’s something that that we’re trying to… we’re spending a lot of time trying to understand right now, really trying to build an infrastructure capacity and a team to do much better with our data, so we can turn that around.

[Marty]: And of course, there’s also… I have a question in my notes, I was going to ask you about how the Smithsonian’s growing emphasis on Open Access influences your job at the museum. [Both laugh.] You already covered a lot of it, but is there anything else you want to say about that? That shift in focus is tremendous.

Well, first I’ll, I’ll maybe finish this story, which maybe goes into that…

So we spent all this time, we did, we got the buy-in for moving forward with everything, I think, at the very end of 2018, and we set ourselves a deadline of a year of actually moving everything out. So we spent all of 2019 pulling all the levers, and doing all the data shifts, and working with our partners who control the servers to do the migration and making sure the websites wouldn’t crash. I mean, we were literally creating… It was 1.1 million brand new web pages, like, when flipping a switch. So what’s that going to do to the server? And it turns out quite a lot, so we had to actually… there was some downstream effects where we had to hire some programmers and vendors to help redo bits of the website so that it could handle this, and new linking, and all sorts of things.

So it, it took a full year of, of actually… of teams dedicated, spending their time to launching this, and on December of ’19, and, and the impact was — or the, the awareness — I don’t want to say the impact because that’s a complicated thing that we don’t quite understand fully. But the awareness of this was zilch. And that was because about Thanksgiving 2019, the institution announced that we were going Open Access in early 2020, and the communication strategy around that was not to talk about Open Data at American History, but to keep our powder dry, so we can talk about Open Access for the institution in March of 2020 or February 2020.

So, in the end, we spent you know, four years imagining a project that in some real ways fundamentally changed the way that we collect objects, the way we document collections, the way we’re managing, physically and intellectually managing our collections, and providing access to, to the national collection. You know, we, we believably have a material record of the history of the United States. And it is now 100 percent – or 99.95 percent — accessible to everyone. And, and we had to sit on that, in service of Open Access, which for us, was so marginal in impact because of the challenges we have with our collection determining a very proactive stance.

So, the Smithsonian has adopted a stance for Open Access, and this was negotiated ad nauseum, for a long time as well, but we ultimately determined that when we were saying that something was Open Access, unlike many other museums, so that when they say something is Open Access they’re just saying that there is no intellectual property concern, we were also looking at moral rights. We were looking at any number of other constraints that might be wrapped around materials. And so, when we say something is Open Access, it is legitimately owned by the nation and can be shared without any limitation. And there are very, very few of our collections for which we can do that, just because — I mean, there… Look, we have a number of things we can do, but they require an item-by-item review by honestly, we have found very few people who have the skill set to do that sort of research in a… at any sort of scale. So we had to choose a very limited number of materials to share with Open Access.

So we were, we were sitting on our communication where we think we did something that was much more important. We provided Open Access to the data. So we guaranteed discoverability of everything. And we adopted a fairly progressive interpretation of fair use for the images of our collection, sharing the highest resolution image possible of all the materials, but still encumbering those assets, with the Smithsonian terms of use, which is an educational, non-commercial use. Believing that, that was… offered greater availability. So we, in doing that, we shared 1.8 million objects with the world. We are today, I think, sharing images of 800,000 of those objects with the world. Anybody can discover them and use the data about them in any way they want, and it’s up to them to determine how they want to negotiate the intellectual property restrictions that might come with the images, but they can still find them, use them and, and that’s up to them. Versus Open Access, I think we are currently sharing 300 items Open Access just because we, we cannot sustain the, the research required to, to share those materials, because it really does require almost a legal finding for, for every single thing. And an assumption of risk that… a positive assumption of risk that our museum’s leadership is not willing to assume when we have the other alternative available to us.

And, and so, Open Access in some ways hasn’t changed much of what we do, because we think we’re offering something better. Because the cost benefit for going Open Data and restricted access to intellectual property, the cost is much lower, I mean… by orders of magnitude, than the cost of doing Open Access. And while we’re, we’re still trying to unpack what the, the impact is.

We, based on the numbers that we see, the greatest indicator of impact, at least by virtue of access and sharing of information, is the presence of an image. There is very little distinction between something having an encumbered image and an Open Access image. There, there’s no apparent benefit immediately for impact for having something Open Access. And when the cost is so much higher, we’re just… we were not going to invest in it. And we just can’t, although we are trying to. We’re making… we have shifted our project management for digitization projects in such a way, where we are trying to fund Open Access review for materials that are going to be subject to, to a mass capture or digitization project.

It’s not so much mass capture when we’re doing 3D, but if we’re photographing 10,000 coins, we are including Open Access review. In many cases, we already know from the beginning, whether or not the majority of them could be Open Access or not, but by funding that, we can actually get the close intellectual property and other descriptions that we need to facilitate access through other means as well.

So, I, I admit I was… I’m still a little bit sour about Open Access [laughs] at the Institution because we spent a lot of time on it. It took a lot of air out of the room to very little benefit. I mean, sure it was great to share all those pictures of rocks and snails and bees from the Natural History Museum, and there is great utility for those materials being shared at scale because their incorporation into data sets for research, for scientists around the world is massive. But, we are not really seeing the tail end of any of those benefits.

[Marty]: Well, that is a… that is certainly a long-term issue with museum digitization and computing. The competing needs between cultural institutions on the one hand, the science institutions here, and art institutions over here.

Yeah. Yeah, the… I mean, that’s… one thing we have spent some time thinking about with other colleagues across the institution is, how could our lessons that we learned from this translate to the art museums? And, and it’s not easy. And in, in many cases they’re way better prepared to do a lot of the work that we needed to do, and because of that, they’re able to, they’ve done what they can already and have made really sophisticated decisions. I mean, when you have, say the Portrait Gallery, you know, the, the ethic for description and utility of collections in that art museum has, over generations, privileged documentation in a way that ours didn’t.

We, we adopted, in some ways, a Natural History sort of specimen-collecting ethos, and then didn’t back that up with the documentation ethos that, that ethnographic and art museums maybe had… So, we had the worst of both worlds. [Laughs]. Heaps of stuff, and no information about it! Great! But art museums have much smaller collections and, and an ethos of documentation that has been sustained for generations, which means that they’re stepping into this, and they have a really sophisticated understanding of the risks at a very narrow level. An item-by-item level. They know exactly what copyright status is and, and provenance, which can really inform the… many of the other risks that has to do around ethics and reputational risk for sharing things.

So, a lot of what we learned just didn’t apply in that space. And then it didn’t apply in the Natural History space, because they are dealing with materials that, if they can resolve the provenance and the authority to hold, they don’t have to worry about intellectual property in so many ways. Or they worry about, their worries about intellectual property have to do with research outputs. And the way that many of the scientific fields have moved to radically, especially in the public sector, to Open Access for everything, has greatly informed what they’re able to do. So, our lessons are best absorbed by historical societies and other history institutions, which are generally, really, really, really tiny, and have no capacity for doing any of this work, so no one is standing ready to take any of these lessons and apply them to their own museum. So, we kind of did something big and I’m not sure where it’s going to go.