logomark arrow icon-close icon-logo-gradient

Loading ...


Finding Ocean Health In A Sea of Data

Ever wonder how the Ocean Health Index gets data for its global assessment of the entire ocean? There's more to it than you'd guess!

Illustration of Global Ocean Observation System.  Credit: Artwork by Glynn Gorick produced for UNESCO Intergovernmental Oceonographic Commission (IOC)-Global Ocean Observing System (GOOS).  

The world is swimming in data of all kinds, so how do you cut through to get just the data you need? The secret is asking the right questions.

Basic Questions

Question #1 for our team was, “What is ocean health?”  The answer we developed:  “A healthy ocean sustainably delivers a range of benefits to people now and in the future.”

That answer spawned more questions: What benefits should we assess? What reference points (targets) could we use to indicate whether the flow of benefits is sustainable? What is the relative importance (weight) of the different benefits and components? What are the best methods to calculate scores?

Underlying everything was the question, ‘What data exist and how can we find them?’. Here we’ll discuss some of the challenges and surprises we encountered. Elsewhere you can read about how we selected benefits, chose reference points and developed methods for calculating scores. 

Sea of Data

The prominent scientific journal Nature estimated that 10 million researchers worldwide devote about 26 billion hours per year on research and development, resulting in about 920,000 scientific articles---and that probably does not count the work of economists, social scientists and other data producers.  

Science in 2015. Credit: Nature. RightsLink Copyright Clearance Center License 3558241406342 

But all that data streaming in to databases all over the world hasn’t yet made it easy to find the information you want. Woods Hole Oceanographic Institution scientist, Peter Wiebe, described data centers as ‘…kind of black holes. It’s very hard to figure out what’s in there and how to get it out.’ 

Marcia McNutt, editor of the influential Science magazine, recently highlighted the need to improve data ‘discoverability.’

Data usually must answer the basic questions that journalists ask when writing a story: Who, What, When, Where and Why, but in somewhat different ways. In this case ‘Who’ names the source of the data; ‘What’ is the measurement itself, along with narrative ‘meta-data’ providing additional information on the units of measurement etc., ‘When’ is the time and date of the measurement; and ‘Where’ is the latitude, longitude, depth and other information on the location of the measurement or sample.

Then there’s ‘Why.’ There are dozens of reasons why data may be gathered ranging from personal interest to mandated collection, and also including availability of funding, geographical convenience, and many others. Since data may have been collected for a very different reason than why one wants to use it, it may be more or less detailed or useful than desired, but it will have to do.

Security agencies sift enormous streams of data trying to detect and prevent acts of terrorism and businesses analyze the social media use and purchasing history of consumers to hone product development, advertising and sales, identifying strategies and, if they wish, things as specific as ‘the countries visited, movies downloaded and favorite colors of all left-handed men with Facebook accounts who surf.’

Data discovery and integration in the ocean sciences isn’t yet that sophisticated, partly because of variability in the type and qualities of data as well as the variety of things different people want to find out.    

For example, global scale data for variables such as sea surface temperature, sea-ice extent, sea level rise or the extent of coastal mangrove forests now come primarily from satellites that cover the planet  relatively seamlessly, all making measurements in the same way. Large scale arrays of buoy systems provide similar in-water coverage. The data are easy to get if you know where to look.  

Map of devices reporting ocean data to Planet OS and its Marinexplore project.  Image by Kristian Paljasma, Planet OS, used with permission.

Data for other integrative topics, such as fisheries landings, mariculture production, economic indicators (jobs, wages, revenue) and others are gathered by agencies in different countries then submitted to international or multi-national groups for aggregation.  The data are accessible, but vary in quality and are often harder to interpret.

Still other data layers must be cobbled together using information contributed by many different investigators.  Examples used in the Ocean Health Index include extent and condition of seagrass beds, population assessment and extinction risks for iconic species, and many aspects of biodiversity.  

Some desired data layers may not exist at all and must be approximated, either by mathematical or by using other directly or indirectly related data (proxy data). For example, there are no globally consistent data on the concentration of chemical pollutants in the ocean, so the Ocean Health Index bases its indicator on the amount of pesticides used by each country, as well as the distribution of commercial ship tracks and major ports and harbors, and mathematically models the flows from land into watersheds and rivers, then models the plumes of pollution diffusing from river mouths out into the ocean. 

Similarly, data on pollution by pathogenic bacteria is lacking in most countries, so the Ocean Health Index uses proxy data from the World Health Organization and UNICEF on the percentage of the coastal population in each country that has access to adequate sanitary facilities.  

As a final example, any data collected on the cultural, traditional and aesthetic importance of the ocean and coasts to a country’s citizens are not reported to a centralized international database, so those rather intangible benefits had to be represented by two proxy measures in the Sense of Place goal: the status of iconic species and the percentage of the coast and near-shore waters held in protective status.  

Drawing on all these types of data and techniques, the Ocean Health Index assembled a list of more than 80 global data layers that are woven throughout its structure. Some are quantitative measures, such as the amount of fish caught or percentage of ocean protected; others are categorical, i.e. expressed not as measurements but as ranks or classes, as illustrated by IUCN Red List’s categories of extinction risk.

Categories of extinction risk for the IUCN Red List.  Each species (or population) assessed is classed into one of the categories shown.  Visualization credit: Encyclopedia Brittanica online

You will soon be able to read more about data discovery as part of a manual to help researchers carry out regional studies of ocean health in their own countries.  The manual will be posted at www.ohi-science.org.  Details on all data and data sources used by the Ocean Health Index for global assessments are shown in Table S23 here, Table S9 here and in descriptive sections of those documents.  The difficulties associated with data availability, discovery and quality control will be apparent to readers of those documents. 

The Ocean Health Index’s data challenges are symptomatic of the fact that learning about the ocean has neither seemed as urgent as national security threats and nor as immediately financially rewarding as increased access to consumers.  Growing awareness of the ocean’s importance is beginning to change that. The Google Ocean platform now visualizes data from many projects and expeditions, though it is not a data source. The Geographic Information Systems (GIS) company Esri, which provides the maps on the Ocean Health Index Web site, is producing an ocean basemap to incorporate many different marine data layers that will be useful for coastal zone spatial planning.  Those data will be available by subscription. 

A ‘big data’ company, Planet OS, has pioneered a single platform for planetary data transformation, access and on-demand visualization. One of their products, Marinexplore, already houses more than a trillion data points originating from 43,415 datastreams coming from 33 institutions, 41 projects and numerous ocean buoy systems. It is relatively easy to find selected data within MarinExplore, because the whole system is designed for that. One researcher’s discussion of how the system will benefit data management for sound in the ocean environment is here.

As these and other new systems for gathering, archiving, and managing data become available, data discovery for projects such as the Ocean Health Index will become less difficult. As regions at all scales agree upon the data and standard collection procedures needed for the Index commit to gathering them on a regular schedule, quality and usefulness of results will steadily improve.

Defining the Study Area

Mention of scores raises a final critical question: what is the ‘study area’ of an assessment, the spatial scale at which final Index scores are reported?

One option is to use the major oceans, such as the North Atlantic, South Atlantic, Southern Ocean etc., as study areas. The United Nations World Ocean Assessment, now in development, uses those areas as an organizing principle. Some research and resource management organizations focus on particular oceans, but the oceans do not have administrative units for comprehensive research and data acquisition, let alone use of results for management. Therefore data for each ocean would have to be assembled from research results gathered independently by scientists and agencies working in many countries. 

Another option could be to assess the oceans’ major ecosystems. Known as Large Marine Ecosystems (LMEs), each has its own structure and dynamics and many are being studied as units. All encompass the coastlines of several nations or states.  None have administrative units with regulatory powers.

Further candidates might be the 232 marine ecoregions into which the world’s coastal and continental shelf area have been divided. Some of the ecoregions include several nations or territories; in other cases a nation may include several ecoregions. Lack of administrative entities with regulatory power as well as transboundary issues weighed against using these areas for the global Ocean Health Index. 

Final biogeographic framework showing ecoregions, numbered as listed in Box 1 in Spalding et al. (2007).  Credit: BioScience by American Institute of Biological Sciences reproduced with permission via Copyright Clearance Center. 

In the end, two considerations persuaded the Ocean Health Index team to use the coastlines and marine waters of nations and their territories as study areas. First was the increased likelihood of finding data. Nations frequently maintain for their own purposes databases of information collected by their scientists, economists and others. Second, nations usually have administrative units able to set policies or enact and enforce regulations, actions necessary for maintaining or improving ocean conditions.

Choice of this study area raised the question: ‘How far seaward and inland should the study areas extend?’ Our seaward choice was 200 nautical miles, the width of the Exclusive Economic Zone (EEZ).

The EEZ concept was adopted as part of the U.N. Convention on the Law of the Sea (UNCLOS) in 1982. That convention defines a country’s territorial waters as a band extending 12 nautical miles (22.2 km) off the coast from some baseline, usually the low-water mark, within which it retains exclusive sovereignty of the water, subsoil, seabed and airspace above. A country may also claim a contiguous zone a further 12 nautical miles seaward of the coast, within which it may enforce military, immigration, sanitation and other rights. The exclusive economic zone (EEZ) extends still further seaward, but not beyond 200 nautical miles (370.4 km) from the coast. Within the EEZ, the coastal state has sovereign rights to explore, exploit, conserve and manage living and non-living natural resources of the seabed and waters, as well as to produce energy from the water, currents and winds. It can also establish and use artificial islands or other structures for economic or marine scientific purposes or to preserve the marine environment. However, within one country’s EEZ, other states still maintain the traditional high seas freedoms, such as freedom of navigation and overflight and the freedom to conduct military exercises. Additional information is shown here.

Unfortunately, use of EEZs as study areas does not solve every problem. Even in the most information-rich EEZs, data gaps still exist, so methods were required to fill holes within existing data series or to estimate desired measurements using models or ‘proxy’ data.

Moreover, measurement of the entire EEZ is not appropriate for all goals or data layers.  

Exclusive Economic Zones (EEZs) of the world, along with disputed areas (shown in red). Credit:  Prof. Jean-Paul Rodrigue, Dept. of Global Studies & Geography, Hofstra University. Data source: VLIZ. Third party use of this illustration requires permission of Prof. Rodrigue.

Look closely at the map above and you will see disputed EEZ boundaries and areas off the coasts of Argentina, Peru/Chile, Colombia/Nicaragua, Norway, Turkey,  northern Japan/Russia, South Korea/Japan, Taiwan (Province of China) and throughout the South China Sea.  Some of these disputes are moving toward resolution, others are not.  The South China Sea disputes illustrate the challenges to resolution.  The area of the Chinese EEZ aggregated by marineregions.org and used by the Ocean Health Index is 875,263.6 km2, an area that includes waters out to 200 nm off the coasts of China and its two Special Administrative Regions, Hong Kong and Macau.  However, China also claims an additional 3 million km2 of the South China Sea, various portions of which are disputed by other countries in the region including Vietnam, Philippines, South Korea, Japan, Malaysia, Brunei and Singapore.

Such marine territorial disputes can only be solved by international agreements in accordance with UNCLOS and resolution is complicated and slow. Projects such as the Ocean Health Index can only use agreed boundaries. The only publicly available source for that information is the VLIZ database compiled by http://www.marineregions.org/ which uses methods shown here. VLIZ adds corrections during its regular database updates, so whenever countries resolve boundary disputes, their newly agreed EEZ boundaries will become part of the updated database. In the meantime, VLIZ maintains information on the disputed areas, as shown in the above map.  Disputed areas are excluded from the scores calculated by the Ocean Health Index. If/when new EEZ boundaries are determined, scores can easily be recalculated. 

Disputed regions create interpretational and practical problems for the disputing countries. Though the Ocean Health Index has no other option, posting of a score for an area bounded in a way that a country does not accept can be politically uncomfortable. Such a country might wish to carry out its own study using the Ocean Health Index framework but self-defining its study area boundaries. The country would have or could obtain data for the VLIZ defined portion of that area, but might have difficulty obtaining similar information for any portion lying within the VLIZ-defined boundaries currently assigned to other countries. 

Take Home Message

These problems notwithstanding, as long as research provides a steady flow of data points, each one describing ‘Who, What, When, Where and Why’ the Ocean Health Index can accommodate any future alterations of EEZ boundaries or other methodological changes. Improved systems for archiving, managing, discovering and displaying those data will help this and other projects point the way toward healthier oceans and healthier, more sustainable societies.  


The information contained in this article applies to all of the goals and methods included in the Ocean Health Index. Thanks to Julia Stewart Lowndes and Courtney Scarborough for helpful editorial comments.    

Error Server error

Error 500 Server error

Unknown: write failed: No space left on device (28) on line 0 in file Unknown.