New media has an obsession with the potential to divine “signals” from Big Data. There’s an emerging belief that, contained within (an often ill-defined, haphazardly constructed, unstructured collection of information dubbed) “Big Data,” there lies (a virtual treasure trove of) signals (whose significance may lie in revealed preferences, network alliances, predictive powers, geospatial epiphanies, or other potentially marketable ephemera) which merely need to be revealed by dispersing the cloud of noise. Increasingly, it is accepted that these signals are so strong that their mere recoding using simple rules and classification criteria will evaporate the fog. Is this real, an illusion, or a guise?
The algorithms and heuristics that underlie the signal creation are almost always secrets. In looking at GDELT, for example, the project has been described as a mash-up of pre-existing tools extracting open source data in an automated fashion. If so, why isn’t the code openly available? And, why doesn’t this kind of question come up when considering the ramifications of Google searches or Facebook graph searches? Perhaps, because there’s a certain sanctity given to the concepts of proprietary information, trade secrets, and business intelligence…despite the potential for staggering disinformatics’ uses.