Data Is Like Water: The Data Ecosystem

This is an article I’ve been wanting to write and an idea I’ve wanted to share for a long time. I believe that important things and real insights are more wonderful when they are shared. Just like we must share the natural resources and water that we have on this Earth, so too should we share our insights, our knowledge, our information, our data, and more.

Data is like water. Data flows in the data cycle, in the data ecosystem, just like water flows, in the water cycle, in our Earth ecosystem.

I believe that water is the unsung metaphor that those of us working in the “art of data” have been using for some time. I also believe that nature and technology are interwoven so tightly as to be essentially the same. This article lends credence to the fact that we intuitively (and perhaps not always consciously) see and use this family of metaphors for each of the two concepts interchangeably!

Now is the time to call out and celebrate how we use something so natural to define something so mechanical/technological, and explore why.

Ecosystem is a beautiful word, in my opinion. An ecosystem is “a biological community of interacting organisms and their physical environment” [OED]. In general use today, it also means “a complex network or interconnected systems” [OED]. When I close my eyes and think of what an “ecosystem” is (and this may be clichéd), I imagine a beautiful forest, the ocean, the sky, the trees & plants, animals, our cities, our world, the Earth (and us!). The daydream I see in my mind, at least, is a magical world where everything works together, even when it doesn’t seem like it is all working together. Perhaps a bit naive, I’d like for that daydream to have clean air, natural beauty, efficient technology and people living together mostly in harmony, exploring and experiencing life and the world, and generally just getting along.

You might call that a utopian vision, but I assure you, that vision I see is not a perfect one - as it should be, because to me, nature is not about perfection, and ecosystems do not have to be working perfectly to be working together - it’s all about balance. Of course, it’s really neat when ecosystems do work together. Well-oiled ecosystems (like well-oiled machines), working beautifully to support productive, healthy activity and life, are really the best part of that dream.

For most people who might be reading this around the year 2014, you might remember something you learned in elementary science education - the water cycle. For those of you who haven’t heard about it, or need a little refresher, here’s that lovely diagram, courtesy of our friends at the United States Geological Survey (USGS):

Source: USGS

“Earth’s water is always in movement, and the natural water cycle, also known as the hydrologic cycle, describes the continuous movement of water on, above, and below the surface of the Earth. Water is always changing states between liquid, vapor, and ice, with these processes happening in the blink of an eye and over millions of years.” [USGS]

Earth’s data is always in movement, and the natural data cycle describes the continuous movement of data throughout information systems on (and off) the Earth. Data is always changing forms and sizes, from unstructured to semi-structured to fully structured, from small data to big data and back again, from one type of data to another, with these processes happening in the blink of an eye (“real-time”) and over millions of years (“historical/archive”).

It’s not hard to see that the metaphor fits further, and extends even into our technology platforms and processes:

Ultimately, doesn’t your data move through a series of oceans and streams, clouds, rivers, ice and snow, and the latest popular metaphor - lakes (e.g. “the data lake”)? See James Dixon’s (CTO of Pentaho) original article about data lakes.

That data moving from place to place, actively used by humans and other systems, being driven by the heat + energy of human activity and organizational/business engines - perhaps this is your “hot data”. Maybe that data gets frozen away as “cold data” in your data warehouses, your historical archives. No matter what phase change your data is going through, we’ve got a metaphor for it.

Data is a resource. Some would say, data is an asset. For all the beauty that I speak of when I talk about data, data can also be a weapon. Just like the atom can be used for energy or nuclear weapons, so too can data.

I need not go far to recall a time when we were so data-driven that we lost some of our humanity in the process. Here I recall Frederick Winslow Taylor and scientific management, in which his time studies were an early form of metric-driven management for organizations. “Past performance is not an indicator of future results” (pardon the pun), but “those who do not learn from history are doomed to repeat it”.

As a Libra, I find myself always looking for balance in things, and being data-driven demands balance with being a humanist. Critics of scientific management (and Taylor) often state that being overly obsessed with efficiency and performance (especially that based off metrics) can overshadow “less quantifiable social benefits”, and there is a risk to society when humanity and social values are left behind [MINTZ].

This line of thinking could easily be a topic of another essay altogether. What is relevant to this essay though, and important in the present day is for there to be balance, much like that which is needed to form an ecosystem.

It’s really interesting to me that we use nature to define the machine; ultimately, isn’t the machine really natural, even though we so often think of machinery as some sort of industrial, unnatural thing? Humans are of nature, therefore, what humans create is of nature as well, and that includes all the technological marvels that we have created, invented and enjoyed throughout our history.

Since it is humans, and human systems/technologies that generate data, then ultimately, that data is of nature and natural as well. Even if generated by a machine, the machine which generated the data is of human origin. This completes the metaphor, and explains why we use it - data itself is natural, human in origin, and data has many properties just like water in nature.

I like to think about the big picture, and it’s often when I’m doing that, that I have insights like the one I am sharing with you today. It’s something people do not spend enough time doing. It’s easy enough to understand why people don’t have (or spend) the time thinking big, with so many things going on in life. It’s easy to put the blinders on and forget to step back and look with awe on what you are doing, and what you are a part of. Isn’t that at the core of the “big data movement”, for example? Perhaps we believe that “big data” equals being able to understand the whole picture, every detail, even if often in summary?

Stepping back to see the big picture is sometimes how you see what you’ve been missing. It feels good to see yourself as a part (even if just a small one) of the bigger picture, the cycle, and the whole ecosystem - Earth, water and data.

Special thanks go to Mike Franko, a colleague who encouraged me to publish this essay - I’m proud to be a part of this data ecosystem with you!

Citations #

[OED] “Definition of Ecosystem in English:.” Ecosystem: Definition of Ecosystem in Oxford Dictionary (American English) (US). Web. 21 June 2014.

[MINTZ] Mintzberg, Henry (ed.) (1989). Mintzberg on Management. New York, New York: The Free Press. ISBN 978-1-4165-7319-7.

[USGS] “The Water Cycle.” U.S. Geological Survey (USGS) Water Science School. Web. 21 June 2014.



Now read this

Connected Data

Today, data is disconnected. Data is fundamentally diverse, disparate and distributed. This has always been a challenge and it continues to be “the” challenge. Disconnected data is front and center, as enormous amounts of human effort... Continue →