20 Feb 2015
Finding Ocean Health In A Sea of Data
Ever wonder how the Ocean Health Index gets data for its global assessment of the entire ocean? There's more to it than you'd guess!
The world is swimming in data of all kinds, so how do you cut through to
get just the data you need? The secret is asking the right questions.
Question #1 for our team was, “What is ocean health?” The answer we developed: “A healthy ocean sustainably delivers a range
of benefits to people now and in the future.”
That answer spawned more questions: What benefits should we assess? What reference points (targets) could we use to indicate whether the flow of benefits is sustainable? What is the relative importance (weight) of the different benefits and components? What are the best methods to calculate scores?
Underlying everything was the question, ‘What data exist and how can we find them?’. Here we’ll discuss some of the challenges and surprises we encountered. Elsewhere you can read about how we selected benefits, chose reference points and developed methods for calculating scores.
Sea of Data
The prominent scientific journal Nature
estimated that 10 million researchers worldwide devote about 26 billion hours
per year on research and development, resulting in about 920,000 scientific
articles---and that probably does not count the work of economists, social
scientists and other data producers.
But all that data streaming in to databases all over the world hasn’t yet
made it easy to find the information you want. Woods Hole Oceanographic Institution scientist, Peter
Wiebe, described data centers as ‘…kind of black holes. It’s very hard to figure out what’s in there
and how to get it out.’
Marcia McNutt, editor of the influential Science magazine, recently highlighted the need to improve data ‘discoverability.’
Data usually must answer the basic questions that journalists ask when writing a story: Who, What, When, Where and Why, but in somewhat different ways. In this case ‘Who’ names the source of the data; ‘What’ is the measurement itself, along with narrative ‘meta-data’ providing additional information on the units of measurement etc., ‘When’ is the time and date of the measurement; and ‘Where’ is the latitude, longitude, depth and other information on the location of the measurement or sample.
Then there’s ‘Why.’ There are dozens of reasons why data may be gathered ranging from personal interest to mandated collection, and also including availability of funding, geographical convenience, and many others. Since data may have been collected for a very different reason than why one wants to use it, it may be more or less detailed or useful than desired, but it will have to do.
Security agencies sift enormous streams of data trying to detect and prevent acts of terrorism and businesses analyze the social media use and purchasing history of consumers to hone product development, advertising and sales, identifying strategies and, if they wish, things as specific as ‘the countries visited, movies downloaded and favorite colors of all left-handed men with Facebook accounts who surf.’
Data discovery and integration in the ocean sciences isn’t yet that sophisticated, partly because of variability in the type and qualities of data as well as the variety of things different people want to find out.
For example, global scale data for variables such as sea surface temperature, sea-ice extent, sea level rise or the extent of coastal mangrove forests now come primarily from satellites that cover the planet relatively seamlessly, all making measurements in the same way. Large scale arrays of buoy systems provide similar in-water coverage. The data are easy to get if you know where to look.
Data for other integrative topics, such as fisheries landings, mariculture production, economic indicators (jobs, wages, revenue) and others are gathered by agencies in different countries then submitted to international or multi-national groups for aggregation. The data are accessible, but vary in quality and are often harder to interpret.
Still other data layers must be cobbled together using information contributed by many different investigators. Examples used in the Ocean Health Index include extent and condition of seagrass beds, population assessment and extinction risks for iconic species, and many aspects of biodiversity.
Some desired data layers may not exist at all and must be approximated, either by mathematical or by using other directly or indirectly related data (proxy data). For example, there are no globally consistent data on the concentration of chemical pollutants in the ocean, so the Ocean Health Index bases its indicator on the amount of pesticides used by each country, as well as the distribution of commercial ship tracks and major ports and harbors, and mathematically models the flows from land into watersheds and rivers, then models the plumes of pollution diffusing from river mouths out into the ocean.
Similarly, data on pollution by pathogenic bacteria is lacking in most countries, so the Ocean Health Index uses proxy data from the World Health Organization and UNICEF on the percentage of the coastal population in each country that has access to adequate sanitary facilities.
As a final example, any data collected on the cultural, traditional and aesthetic importance of the ocean and coasts to a country’s citizens are not reported to a centralized international database, so those rather intangible benefits had to be represented by two proxy measures in the Sense of Place goal: the status of iconic species and the percentage of the coast and near-shore waters held in protective status.
Drawing on all these types of data and techniques, the Ocean Health Index assembled a list of more than 80 global data layers that are woven throughout its structure. Some are quantitative measures, such as the amount of fish caught or percentage of ocean protected; others are categorical, i.e. expressed not as measurements but as ranks or classes, as illustrated by IUCN Red List’s categories of extinction risk.
You will soon be able to read more about data discovery as part of a
manual to help researchers carry out regional studies of ocean health in their
own countries. The manual will be posted
at www.ohi-science.org. Details on all data and data sources used by
the Ocean Health Index for global assessments are shown in Table S23 here,
Table S9 here
and in descriptive sections of those documents.
The difficulties associated with data availability, discovery and
quality control will be apparent to readers of those documents.
The Ocean Health Index’s data challenges are symptomatic of the fact that learning about the ocean has neither seemed as urgent as national security threats and nor as immediately financially rewarding as increased access to consumers. Growing awareness of the ocean’s importance is beginning to change that. The Google Ocean platform now visualizes data from many projects and expeditions, though it is not a data source. The Geographic Information Systems (GIS) company Esri, which provides the maps on the Ocean Health Index Web site, is producing an ocean basemap to incorporate many different marine data layers that will be useful for coastal zone spatial planning. Those data will be available by subscription.
A ‘big data’ company, Planet OS, has pioneered a single platform for planetary data transformation, access and on-demand visualization. One of their products, Marinexplore, already houses more than a trillion data points originating from 43,415 datastreams coming from 33 institutions, 41 projects and numerous ocean buoy systems. It is relatively easy to find selected data within MarinExplore, because the whole system is designed for that. One researcher’s discussion of how the system will benefit data management for sound in the ocean environment is here.
As these and other new systems for gathering, archiving, and managing data become available, data discovery for projects such as the Ocean Health Index will become less difficult. As regions at all scales agree upon the data and standard collection procedures needed for the Index commit to gathering them on a regular schedule, quality and usefulness of results will steadily improve.
Defining the Study Area
Mention of scores raises a final critical question: what is the ‘study
area’ of an assessment, the spatial scale at which final Index scores are
One option is to use the major oceans, such as the North Atlantic, South Atlantic, Southern Ocean etc., as study areas. The United Nations World Ocean Assessment, now in development, uses those areas as an organizing principle. Some research and resource management organizations focus on particular oceans, but the oceans do not have administrative units for comprehensive research and data acquisition, let alone use of results for management. Therefore data for each ocean would have to be assembled from research results gathered independently by scientists and agencies working in many countries.
Another option could be to assess the oceans’ major ecosystems. Known as Large Marine Ecosystems (LMEs), each has its own structure and dynamics and many are being studied as units. All encompass the coastlines of several nations or states. None have administrative units with regulatory powers.
Further candidates might be the 232 marine ecoregions into which the world’s coastal and continental shelf area have been divided. Some of the ecoregions include several nations or territories; in other cases a nation may include several ecoregions. Lack of administrative entities with regulatory power as well as transboundary issues weighed against using these areas for the global Ocean Health Index.
In the end, two considerations persuaded the Ocean Health Index team to
use the coastlines and marine waters of nations and their territories as study
areas. First was the increased
likelihood of finding data. Nations frequently maintain for their own purposes
databases of information collected by their scientists, economists and others.
Second, nations usually have administrative units able to set policies or enact
and enforce regulations, actions necessary for maintaining or improving ocean
Choice of this study area raised the question: ‘How far seaward and inland should the study areas extend?’ Our seaward choice was 200 nautical miles, the width of the Exclusive Economic Zone (EEZ).
The EEZ concept was adopted as part of the U.N. Convention on the Law of the Sea (UNCLOS) in 1982. That convention defines a country’s territorial waters as a band extending 12 nautical miles (22.2 km) off the coast from some baseline, usually the low-water mark, within which it retains exclusive sovereignty of the water, subsoil, seabed and airspace above. A country may also claim a contiguous zone a further 12 nautical miles seaward of the coast, within which it may enforce military, immigration, sanitation and other rights. The exclusive economic zone (EEZ) extends still further seaward, but not beyond 200 nautical miles (370.4 km) from the coast. Within the EEZ, the coastal state has sovereign rights to explore, exploit, conserve and manage living and non-living natural resources of the seabed and waters, as well as to produce energy from the water, currents and winds. It can also establish and use artificial islands or other structures for economic or marine scientific purposes or to preserve the marine environment. However, within one country’s EEZ, other states still maintain the traditional high seas freedoms, such as freedom of navigation and overflight and the freedom to conduct military exercises. Additional information is shown here.
Unfortunately, use of EEZs as study areas does not solve every problem. Even in the most information-rich EEZs, data gaps still exist, so methods were required to fill holes within existing data series or to estimate desired measurements using models or ‘proxy’ data.
Moreover, measurement of the entire EEZ is not appropriate for all goals or data layers.
Look closely at the map above and you will see disputed EEZ
boundaries and areas off the coasts of Argentina, Peru/Chile, Colombia/Nicaragua,
Norway, Turkey, northern Japan/Russia,
South Korea/Japan, Taiwan (Province of China) and throughout the South China Sea. Some of these disputes are moving toward
resolution, others are not. The South
China Sea disputes illustrate the challenges to resolution. The area of the Chinese EEZ aggregated
by marineregions.org and used by the Ocean Health Index is 875,263.6 km2, an area
that includes waters out to 200 nm off the coasts of China and its two Special Administrative Regions, Hong Kong and Macau. However, China also claims an additional 3
million km2 of the South China
Sea, various portions of which are disputed by other countries in the region
including Vietnam, Philippines, South Korea, Japan, Malaysia, Brunei and
Such marine territorial disputes can only be solved by international agreements in accordance with UNCLOS and resolution is complicated and slow. Projects such as the Ocean Health Index can only use agreed boundaries. The only publicly available source for that information is the VLIZ database compiled by http://www.marineregions.org/ which uses methods shown here. VLIZ adds corrections during its regular database updates, so whenever countries resolve boundary disputes, their newly agreed EEZ boundaries will become part of the updated database. In the meantime, VLIZ maintains information on the disputed areas, as shown in the above map. Disputed areas are excluded from the scores calculated by the Ocean Health Index. If/when new EEZ boundaries are determined, scores can easily be recalculated.
Disputed regions create interpretational and practical problems for the disputing countries. Though the Ocean Health Index has no other option, posting of a score for an area bounded in a way that a country does not accept can be politically uncomfortable. Such a country might wish to carry out its own study using the Ocean Health Index framework but self-defining its study area boundaries. The country would have or could obtain data for the VLIZ defined portion of that area, but might have difficulty obtaining similar information for any portion lying within the VLIZ-defined boundaries currently assigned to other countries.
Take Home Message
These problems notwithstanding, as long as research provides a steady
flow of data points, each one describing ‘Who, What, When, Where and Why’ the
Ocean Health Index can accommodate any future alterations of EEZ boundaries or
other methodological changes. Improved systems
for archiving, managing, discovering and displaying those data will help this
and other projects point the way toward healthier oceans and healthier, more
The information contained in this article applies to all of the goals and methods included in the Ocean Health Index. Thanks to Julia Stewart Lowndes and Courtney Scarborough for helpful editorial comments.