Data mining

Data mining (DMM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns using tools such as classification, association rule mining, clustering, etc. Data mining is a complex topic and has links with multiple core fields such as computer science and adds value to rich seminal computational techniques from statistics, information retrieval, machine learning and pattern recognition. Available data on the internet is constantly growing and changing, with personal information filling in the missing gaps. Searching and using this information is called Data Mining.


Privacy Concerns

Privacy is an issue which concerns our entire society. Data mining has been put into practice by many businesses before data mining even had a name. The inverse relationship between smaller drives and increased storage capacity has afforded various organizations to collect large amounts of data in the form of minutia from individual users. These minutia are harmless by themselves but when other organizations compile the data collected regarding an individual user or users this information can be used to discriminate against certain populations. The purpose of data mining to seek out and find patterns in data which can be might not other wise used as information. Since data mining is based on the extraction of unknown patterns from a database, "data mining does not know, cannot know, at the outset, what personal data will be of value or what relationships will emerge. Therefore, identifying a primary purpose at the beginning of the process, and then restricting one's use of the data to that purpose are the antithesis of a data mining exercise."( Ontario Information and Privacy Commissioner Ann Cavoukian. The report, Data Mining: Staking a Claim on Your Privacy")


Mr. Hulse used small time "data-mining" on Mr. Kondor as an example to discover what kind of information he could discover. He found out the Kondor household mortage, his address, his parent's income, his phone number, along with several other pieces of confidential information which he would like to refrain from being posted in this Wiki. Mr Hulse is a kind gentleman with no need for malicious harm of Mr. Kondor, but what if a terrorist or enemy got hold of this information? What would be the implications? Who would be hurt? How hard can finding a social security number be? Stealing an identity?

The below table is another example of what kinds of information companies can discover about you, just by purchasing a product of theirs.
external image Data_Mine_Table.GIF

Notable uses of data mining

Data mining is most often seen in the retailing and marketing world. It allows companies to retrieve data that can be used to generate more profit. A great deal of useful information can be analyzed and put to use, answering questions such as:

  • Who is buying our product and who should we focus our advertising on?
  • In what region are we selling the most?
  • Are our customers happy?
  • What are they buying?
  • How much of it are they buying?
  • Who are our competitors in the market?

Wal-Martexternal image walmart.jpg

Wal-Mart's use of data mining has been very prominent in past years. The company continuously keeps track of transactions in nearly 3,000 stores, and then transmits this data to a 7.5 terabyte data warehouse. Here, employees look over the information to see what products are selling well and respond to it. This allows Wal-Mart to manage their inventory optimally and find new sales opportunities. Data collected is shared with suppliers, who then know where their products are selling the best, providing mutual benefit.

Impact on Wal-Mart

It has been estimated that the use of RFID tags would save the retailers about $69 billion worldwide. Nearly three hundred suppliers currently ship products to Wal-Mart. The products are being received at five distribution centers. More than 3 million tagged items are scanned by Wal-Mart every week. Wal-Mart would be scanning goods from more than 600 suppliers by 2007.

Links Used