Making something useful, and possibly beautiful, out of a vast, murky, government dataset is daunting to most people. Thankfully there are others, like Jo Polanco, who see the possibilities within that information. “Having the right metadata can empower people to be smarter and more strategic about what they need in their communities,” she said recently during a presentation and panel discussion about NYC Open Data and the Metadata for All project.
Hosted by the Metropolitan New York Library Council (METRO), this was a great example of the kinds of events the organization presents throughout the year, which I came across after visiting another terrific service they offer — their comprehensive job listings page. With a background in New York City journalism and still feeling a responsibility to do what I can, small as it may be, to improve my neighborhood, phrases like “DSNY monthly tonnages collected curbside” actually spark my interest, so the event listing caught my eye. And, because it covered some of the complicated aspects of making New York City government data accessible — and, more importantly, understandable — to the average, non-municipal, user, it happened to coincide nicely with several issues of user experience design we explored this week for class.
Polanco, who graduated from the Pratt MLIS program in May of this year, was part of the six-month, grant-funded Metadata for All project, which was tasked with finding “ways to make the metadata of the top 100 most-used datasets on NYC Open Data more usable and explore new ways to ensure that metadata is user-friendly for all New Yorkers.” Her work, alongside other data librarians and with the Mayor’s Office of Data Analytics (MODA) and the NYC Open Data team, delved into user experience design by examining how all stakeholders might use the data, and how the data’s presentation might be improved to better serve those users.
NYC Open Data a living project. Since the Open Data Law was passed by City Council in 2012, the city has presented periodic updates about its progress, and the most recent report shows what an unwieldy, remarkable amount of information it is. The NYC Open Data platform hosts more than 2,000 datasets with billions of rows of data. “The next largest city has a few hundred datasets,” Polanco told me in an interview following the event. That may not be enough to justify issues WNYC has raised about delayed data releases, but it does help provide a level of comparison for the amount of data the policy covers. As she added: “The scale is just awesome.”
To understand who the Metadata for All team was considering as it approached the project, the city’s recent report provides a good description:
“The City’s commitment to Open Data for All means open data for people from all walks of life, from all five boroughs, who are using open data to make a difference in their communities — including educators, students, artists, builders, small business owners, advocates, reporters, community board members. It also means open data for the 300,000 hardworking men and women who make New York City safer, cleaner, and more equitable.”
Those 300,000 people are city government employees, which is a fact that Polanco pointed to as one of the major challenges. “We have to acknowledge and realize that New York City government is the size of a city,” she said. “It’s a very multi-layered bureaucratic system, and that’s got to be baked into the approach.”
Certainly, as WNYC reported, part of the challenge for city agencies in releasing all this data is that despite that number of employees, they may not be staffed well enough to handle it. But another, bigger-picture issue is that those people who are dealing with the data (called an Open Data Coordinator, which each agency is supposed to have on staff) may not always know what’s going to happen with the data once it’s released. That’s part of what the Metadata for All project tried to relate in their work. For example, Polanco points to a current question on the Open Data Dictionary template for Open Data Coordinators, which asks, “How can this data be used?”
“Most of them skipped that question, because that’s a weird question!” she said. “You don’t really know how it could be used. You can’t know every application of a dataset.” Their recommendation was to change that, and things like it. “Instead we asked, ‘What kinds of questions could be answered by this data?’”
Going further into what other useful metadata an Open Data Coordinator might be able to provide, she said, “Someone handling the data could know that what you want the user to understand is what this data is and is not.” For example, in describing what 311 data is, they found it was important for users to understand that it’s not simply a collection of complaints. As an Open Data Coordinator who manages 311 data would know, 311 operators field complaints and then, if necessary, direct them to other agencies that may be able to assist. Furthermore, the complaints that are collected don’t always show the whole picture. “It doesn’t tell you just because X amount of people reported rat sightings, that’s the place with the most rats,” Polanco said. “It’s the place with the most rat-sighting reports.”
So the Metadata for All team created recommendations that data dictionaries describe those sorts of basic things about what’s contained within a dataset. “What we wanted to was to deliver a usability guide for the Open Data Coordinators to write this documentation,” Polanco said.
Perhaps as Open Data Coordinators consider what users of the data may do with it, the more they’ll get out of the work. As Aarron Walter noted while outlining UX strategies that were implemented at MailChimp:
“We’ve found that when people are given the opportunity and the platform to share their data or do something new with existing data, they feel pride knowing their work is valuable to others. It feels good to see different areas of the company benefiting from the work you’re doing. Everyone wants their work to be valued and appreciated.”
As panelists mentioned at the NYC Open Data event, various city agencies use the portal to access data from other agencies (which is a more streamlined version of what they used to do — turns out even city employees have had to wait a long time for another agency to provide data to them), so the data is being shared within the “company” that is NYC government. But it’s also interesting to reframe Walter’s corporate view of a “connected approach to research” by looking at NYC Open Data as it relates to the city as a whole. Because, as Polanco correctly points out, that data “is ours; we own it.” So when Walter writes, “What we’re building is more than just connected data — it’s a connected company,” what Open Data Coordinators are helping to build is a connected city.
The people of this city (and beyond) were the most important consideration in this project. One of the goals of MODA is to find communities that don’t know NYC Open Data exists, and then show them how to use it, so the Metadata for All team had to consider a wide variety of potential users. Past examples show the way NYC Open Data has been used can be fun, compelling, and incredibly important for communities in the city. In many ways, people can use this data to “be an advocate for yourself,” Polanco said. “Say you think there is a dangerous intersection. Well, you can go to that data, you can look across a few different things — at the actual speed camera data, how many people have sped through there, you can look at NYPD data that says how many people have been injured there, and how many accidents, how many people have been killed. You can do that, and you can use that data to be an advocate in your community and say I need a stop sign, a traffic light, a safety guard. It’s really about understanding your power.”
To see what other possible ways people might use the data, the team held workshops in each of the city’s library systems. Among other things, they learned that community boards want to understand trash pick-ups in Staten Island, and journalists in Manhattan want to use more of the data in their reporting. “This information is out there for them,” Polanco said, noting that these community engagements helped shape many of their recommendations. For instance, they asked community members for assumptions of what certain datasets might include, and their answers helped guide the team in deciding what information to include in data guides and dictionaries.
For Polanco, who changed careers after realizing her work with data and analysis in the corporate retail world overlapped well with librarianship, this project has been a natural progression of her professional path — and the same might be said for anyone in an information profession, whether or not they consider UX a part of their job.
“Whether you know it or not, you’re already a designer,” Aaron Schmidt wrote in his first UX column for Library Journal, where he explored the role of user experience design in libraries. Which is to say, librarians may have already done some of this without even thinking about it — design is inherent in what a librarian does. How that impacts the future of NYC Open Data, and projects like it, would ideally be that more librarians are an integral part of the projects, and sooner.
“Librarians should really be involved in metadata,” Polanco told me, adding that perhaps only one person working in a city agency with NYC Open Data has a background in librarianship. “I think there’s a lot of opportunity, especially for information professionals, to work with data like this, which is very big data. This is the essence of information organization.”
In a city-sized organization like NYC government, should librarians play a larger role in organizing and presenting data? And, more broadly, what other opportunities are there for data like this in the future? For example, could it be useful (or possible) to create linked open data between NYC and Chicago?
We can wonder about all that, but it may take a while before the bureaucratic behemoth of NYC government considers these ideas. Currently, the Metadata for All project has concluded, and the data dictionaries they developed will be reviewed, along with the standards and other recommendations they outlined, so we’ll have to look for updates on how the city proceeds with them. Whatever happens, in New York City, open government data is here, and, as MODA’s Adrienne Schmoeker declared during the event, it “isn’t going away.” So as long as we in the information professions play a role, the first step is making the data as user-friendly as possible. But the second, equally important step, is ensuring people use it.
Originally written for INFO-654: Information Technologies, Pratt School of Information, Fall 2018.