OpenRefine

OpenRefine Logo
OpenRefine - A free, open source, powerful tool for working with messy data

OpenRefine - What is it?

Using OpenRefine

Metadata Cleaning Scenario

You have a CSV file of MARC fields extracted from your ILS (probably generated using MarcEdit). It represents a collection of children’s board books. A faculty member is interested in using the data in her research, but the MARC elements and subfields, and multi-valued cells aren’t going to work for her needs. You are hoping the build a reusable workflow for these materials

If You Close Your Browser Window

If you close your browser or tab, open a new browser window or tab and navigate to:

http://127.0.0.1:3333/ or http://localhost:3333/

Creating a New Project

You can create a new project with data from:

OpenRefine Home Screen
OpenRefine Home Screen

Inside OpenRefine

Text vs. Numbers

Similar to how Excel allows you to choose the format of the data inside a cell and defaults to “General”, OpenRefine lets you choose the format of the data as well but defaults to “Text.”

This can mess up your metadata clean up. So it’s important to make sure that the format the program is using to read your data actually matches your data type.

Split Columns

If data in a column needs to be split into multiple columns, and the parts are separated by a common separator (say a comma, or a space), you can use that separator to divide up the pieces into their own columns.

OpenRefine Split Tool
OpenRefine Split Tool

Undo/Redo

OpenRefine tracks all of the changes you make in your project, so it is easy to navigate back to where you want to fix your mistake and go from there.

WARNING: Once you go back and make another change from that point, you will lose everything you did after that point originally.

OpenRefine Undo/Redo Tool
OpenRefine Undo/Redo Tool

Trim Whitespace

What is whitespace?

Whitespace is very difficult to catch with the naked eye, especially in large datasets, but can impact how your system reads data.

Edit cells

There are many ways other than just trimming leading/trailing whitespace to edit cell values in OpenRefine.

OpenRefine Common Transformation List
OpenRefine Common Transformation List

Add Column Based on This Column

If you want to create a new column, drawing from data in another column use this feature.

You can use simple or complex language to create a new column. (Read: You can use GREL or Python script here!)

OpenRefine Add Column Based on this Column Tool
OpenRefine Add Column Based on this Column Tool

Faceting

Faceting groups together every identical instance of cell values in a column.

OpenRefine Facet Navigation
OpenRefine Facet Navigation

Split Multi-Valued Cells

For better faceting and clustering, sometimes its better to split multi-valued cells within the same column. In OpenRefine this makes new rows in each record. These can be recombined later, after the data is cleaned.

OpenRefine Split Multi-Value Cell Tool
OpenRefine Split Multi-Value Cell Tool

Clustering

Clustering uses natural language processing to group like terms together for you to edit.

It will help you catch mistakes like “nevada” vs. “Nevada” while also letting you decide preferred forms of terms (maybe you want to prefer the plural for subjects i.e. “teachers” over “teacher”).

OpenRefine Clustering Tools
OpenRefine Clustering Tools

Filtering

Filtering allows you to filter rows or records by keyword or regular expression searching a column.

Generating Scripts from OpenRefine

Let’s say you know you have to do the same changes on multiple projects over and over again.

OpenRefine allows you to extract a script of all of the changes you have made to the project.

When you create a new project, you can then apply that script and it will automatically run all of the changes on the new project.

OpenRefine Extract Operation History
OpenRefine Extract Operation History

Saving and Exporting

Exporting Data

When you are done with your project, and you have decided that you only need to extract the results of your cleanup project, you will export your data.

OpenRefine Export Functions
OpenRefine Export Functions

Importing and Exporting Projects

When would you export a project?

You can import .tar or .tar.gz project files at the home screen, where you created a new project.