What I learned about #BigData this week | Careers in Federal Libraries

The Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements (07192012) was hosted by the National Academies of Science. In no particular order, here are the highlights and comments that grabbed my attention.

Loved David Weinberger’s comment: “The value of a scientific information commons is to have your nerds arguing with my nerds”.

NSF Office of Cyber Infrastucuture see Data as a Transforming Agent

will begin non-governmental awards and working groups across global boundaries soon
NSF stresses agile development, rough consensus to push forward quickly, and community involvement
What infrastructure is needed to move terabytes and petabytes quickly? How do we build and sustain that network?

IMLS expects more proposals to educate MLIS and archivists in future; stressing the MLIS education funded thru grants to prepare to handling, life cycle mgmt digital content, analyzing data sets

David Weinberger, Harvard University, author of “Too Big to know” and “Everything is Miscellaneous”

move to filter data on the way out, not on the way in as a search and retrieval strategy
Today we do Collaboration across namespaces
Data as cells can be modeled by modeling a domain until you hit a certain level of complexity
Integrating multiple complex models increase the error rates
“The value of a scientific information commons is to have your nerds arguing with my nerds”

Joshua Greenberg, Sloan Foundation

A lack of digital curation capacity at the producer level provides a shaky foundation for big data future
What is a data scientist? Universities scrambling to train up staff, but perhaps not in digital curation activities
Sloan Foundation, funding data wrangling efforts, new skills in analysis, computational research

Myron Guttmann, NSF

Need specialized training and education w/in the scientific community itself; training for methodologists
Must integrate digital curation into the scientific research process; libraries, archivists and scientist join hands! Myron Guttmann, NSF: big data announcement from March drives toward more attention to curation, analysis, preservation in context
Committed to learning how scientific work is done, do as much training as they can by partnering w/ universities
How do we integrate those COIs into the scientific research community where the work is being done? What kinds of communities for data are out there? We need to build COI around data in new and interesting ways
NSF policy requirement will try to tease out the findings of data management plans submitted since Jan 2011; one change will likely be in the NSF bio sketch which will require a list of by-products of research (reports, data set, videos)

Michael Stebbins, White House OSTP

Must be cautious of burdens on scientists; agencies are already queuing up policies on managing open data
Funds will naturally be shifted to solve big data problems
Forming private-public partnerships accelerates the research; nucleates activity
Administration worked hard to improve public access to technical data, technical publications and raw data sets
Data management plans needed for agencies; having those plans being reviewed by peers was great idea
At crossroads assessing what concerns about burdens need to be addressed; deep concerns related to sharing data

Margarita Gregg, NOAA

The digital era includes digitizing and harmonizing data that exists in tangible format only.
NOAA understands they will not be able to preserve everything, all data, in perpetuity.
Knowledge skills that they need are intersection between scientists, IT, librarian
Requirement for the future is to find interdisciplinary trained, hybrid worker
People need to understand and be comfortable with manipulating, understanding, and extracting data and then be able to translate into useful products; be able to discover which tools people really need and how do we provide so they are understandable to the end user.
Most pressing personnel needs are in data mining; systems architects; scientific stewardship
Workers needs skills in digital rights management and intellectual property management

Anne Kenney, Director, Cornell University Libraries

62% library budget goes to electronic resources, mostly just a few publishers; Big Science major driver for ACRL libraries
Digital curation related issues prompted eScience working group at ACRL which noted gaps in capabilities
Guide for research libraries published as result of NSF data management mandate; eScience Institute hosted by ACRL
7 roles for librarians and archivists listed in work entitled “New Roles for New Times”
in Humanities, focus on digital learning, creation on scholarly products rather than focusing on research products
in eScience, focusing more on harmonization of initially captured data; social networking in virtual communities
Lots of interest in embedded scientists in the library
Reskilling for Research — identifies 9 gaps in training (data mining, metadata creation, etc.) from librarian perspective
may be a role for teaching libraries as there are teaching hospitals

Vicki Ferrini, Associate Research Scientist, Lamont-Doherty Earth Observatory, Columbia University

Works as a data scientist, marine scientist trained in geoinformatics; liaise with the scientists to translate and apply data models, develop data discovery tools, build data compliance tools for NSF requirements; build education materials
Scientific data continuum changed (only making data available as part of a published report); now Columbia using data archivists for feedback between data producer and data consumer
Intersect data producers, data consumers, and data providers to find the data scientist; need domain knowledge, need acquisition skills, understanding of the data requires grounding in the science

Elizabeth Liddy, Dean Syracuse University School of Information

Data scientists competencies: Data analytics, data structure, data mining, ability to run information extraction on unstructured data, statistical analysis, recognize risk and noise and data quality, data and information visualization, understanding of infoviz tools, being able to design
Data archiving and preservation, how to select and provide access and storage, data stewardship, migration of data
Task force assembled at the iConference worked on these competencies

Nancy McGovern, Curation and Preservation Services, MIT Library

Amazon web conference in May: challenges include scale, complexity, speed
Start with data, then end with BIG data
Create a SWOT analysis, take your library skills in and compare to desired skills to define gaps
She helped with UNC Chapel Hill outcome matrix with categories for skill sets
Look for findings from project DIPR on dissemination packages research

Leave a Reply Cancel reply