Mine, All Mine:
New Breakthrough in Data Mining
by Sir Thomas More

When you deal with companies, you aren't just a customer, but you're also a mass of information with many 'dimensions' within a computer database.

Researchers have devised a new method for simpler, faster 'data mining' -- a way to simply extract and analyse massive amounts of this data.

"Whether you like it or not, Google, Facebook, Walmart and the government are building profiles of you, and these consist of hundreds of attributes describing you," said Professor Suresh Venkatasubramanian of the University of Utah.

"If you line them up for each person, you have a line of hundreds of numbers that paint a picture of a person -- who they are, what their interests are, who their friends are and so forth."

"These strings of hundreds of attributes are called high-dimensional data because each attribute is called one dimension."

"Data mining is about digging up interesting information from this high-dimensional data."

A group of data-mining methods dubbed 'multidimensional scaling' (MDS) first developed in the 1930s has been used ever since to make data analysis simpler by reducing the dimensionality of the data.

Professor Venkatasubramanian described MDS as 'probably one of the most important tools in data mining and is used by countless researchers everywhere.'

But Professor Venkatasubramanian and colleagues have now devised a new method of multidimensional scaling that is faster, simpler, can be used for a wider range of problems and can handle more data.

"Data mining means finding patterns, relationships and correlations in high-dimensional data," Professor Venkatasubramanian said.

"You literally are digging through the data to find little veins of information."

"The challenge of data mining is dealing with the dimensionality of the data and the volume of it."

"So one expression common in the data mining community is 'the curse of dimensionality.'"

"The curse of dimensionality is the observed phenomenon that as you throw in more attributes to describe individuals, the data mining tasks you wish to perform become exponentially more difficult."

"We are now at the point where the dimensionality and size of the data is a big problem."

"It makes things computationally very difficult to find these patterns we want to find."

The new method can handle large amounts of data because 'rather than trying to analyse the entire set of data as a whole, we analyze it incrementally, sort of person by person," said Professor Venkatasubramanian.

That speeds up data mining 'because you don't need to have all the data in front of you before you start reducing its dimensionality.'

Professor Venkatasubramanian acknowledged that there are privacy concerns around data mining, but also highlighted the potential benefits to consumers.

"The issue of privacy in data mining is like any set of potentially negative consequences of scientific advances," he said.

"If you target advertising based on what people need, it becomes useful."

"The better the advertising gets, the more it becomes useful information and not advertising."

"And the way we are being inundated with all forms of information in today's world, whether we like it or not we have no choice but to allow machines and automated systems to sift through all this to make sense of the deluge of information passing our eyes every day."

Posted in: Science by bubblejam at 08:28 AM | Comments (0) | Email This Entry

Comments

Post a comment




Remember Me?