Cleaning and Linking Data with OpenRefine
Before analysis comes the messy work of evaluating, cleaning, and transforming data. This hands on workshop will introduce a free power tool to get the job done: OpenRefine. We will install Refine, create a project, and get oriented to the many features for exploring and transforming tabular data.
Workshop Survey
After the workshop, please complete the assessment survey. This helps us make sure our workshops are always getting better.
Agenda:
- About (5 minutes)
- What is OpenRefine?
- Data Security
- Data Types
- Use Cases
- Messy (5 minutes)
- What is “Messy Data”?
- Start (5 minutes)
- Setup OpenRefine
- Start OpenRefine
- Troubleshooting Installation
- Cleaning (90 minutes)
- About the Data
- Creating a Project
- Navigating OpenRefine
- Text Filters
- Facets
- Editing Data
- Transforming Data
- Clustering
- Creating New Columns
- Splitting Columns
- Removing Duplicate Rows
- Working with Different Data Types
- Interacting with Rows
- Exporting a Project or Data Sets
- Undo / Redo
- Bringing It All Together
- Automating Workflows
- Fetching Data from a URL
- Reconciliation (10 minutes)
- What is Reconciliation?
- Using OpenRefine’s Built-in WikiData Service
- Extending Data from Reconciled Data
- Other Data Services
- Web Scraping (3 minutes)
- What is Web Scraping?
- Use Cases for OpenRefine for Web Scraping
- Resources (2 minutes)
- Overview of Other Resources
Originally created: September 14, 2018. Updated: March 22, 2019, November 1, 2019, January 17, 2020. Developed for the the ULS Digital Scholarship Workshop Series at the University of Pittsburgh Libraries.
built using Jekyll and GitHub Pages
content: cc-by-sa Michael Bolam 2019. (get source code)
original content and site design: cc-by-sa evan will 2016. (get source code)