Data analysis Software, data mining represents a difference of

Data mining involves the use of sophisticated data analysis tools
to discover previously unknown, valid patterns and relationships in large data
sets. These tools can include statistical models, mathematical algorithms, and
machine learning methods (algorithms that improve their performance
automatically through experience, such as neural networks or decision trees).
Consequently, data mining consists of more than collecting and managing data,
it also includes analysis and prediction.

Data mining can be performed on data represented in quantitative,
textual, or Multimedia forms. Data mining applications can use a variety of
parameters to examine the data. They include association (patterns where one
event is connected to another event, such as purchasing a pen and purchasing
paper), sequence or path Analysis (patterns where one event leads to another
event, such as the birth of a child and purchasing diapers), classification
(identification of new patterns, such as Coincidences between duct tape
purchases and plastic sheeting purchases), clustering (Finding and visually
documenting groups of previously unknown facts, such as

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Geographic location and brand preferences), and forecasting (discovering
patterns from which one can make reasonable predictions regarding future
activities, such as the prediction that people who join an athletic club may
take exercise classes).

As an application, compared to other data analysis applications,
such as structured queries (used in many commercial databases) or statistical
analysis Software, data mining represents a difference of kind rather than
degree. Many simpler analytical tools utilize a verification-based approach,
where the user develops a hypothesis and then tests the data to prove or
disprove the hypothesis. For Example, a user might hypothesize that a customer,
who buys a hammer, will also buy a box of nails. The effectiveness of this
approach can be limited by the creativity of the user to develop various
hypotheses, as well as the structure of the software being used. In contrast,
data mining utilizes a discovery approach, in which algorithms can use to
examine several multidimensional data relationships simultaneously, identifying
them that are unique or frequently represented. For example, a hardware Store
may compare their customers’ tool purchases with home ownership, type of
Automobile driven, age, occupation, income, and/or distance between residence
and the store. As a result of its complex capabilities, two precursors are
important for a Successful data mining exercise; a clear formulation of the
problem to be solved, and access to the relevant data.

In the decision support
system, data is stored in the form of cube and also cube is used to represent
the major of interest. Data cube may be of 2 dimensional, 3 dimensional and
higher dimensional. Each dimension represents attributes of data and cells in
the data cube represent the measure of interest.In the year 2011, Han, Jung 265
described the Aspect Oriented Programming (AOP) is well suited to cluster
computing software by using simple, intuitive, and reusable aspects. Throughout
qualitative and performance evaluations, AOP significantly improves the code
readability as well as the modularity, and AOP-based software has the same
performance and scalability as similar software that is developed without using
AOP. Guabtni, Ranjan 266 concerned
with data provisioning services (information search, retrieval, storage, etc.)
dealing with a large and heterogeneous information repository. Increasingly,
this class of services is being hosted and delivered through Cloud
infrastructures. Awang 268 proposed an algorithm and analytical model based
on asynchronous approach to improve the response time, throughput, reliability
and availability in Web Server Cluster. The provision of high reliability in
this model is by imposing a neighbor logical structure on data copies. Data
from one server will be replicated to its neighboring server and vice versa in
the face of failures. 

 

Senguttuvan, Krishna 283 reviewed
five of the most representative off-line clustering techniques: K-means
clustering, Fuzzy C-means clustering, Mountain clustering, Subtractive
clustering and Extended Shadow Clustering. The techniques were implemented and
tested against a medical problem of heart disease diagnosis. Performance and
accuracy of the four techniques are presented and compared. Soni,
Ganatra 284 provided a
categorization of some well known clustering algorithms. It also describes the
clustering process and overview of the different clustering methods. Bahmani, Moseley 289 proposed
initialization algorithm k – means  
obtains a nearly optimal solution after a logarithmic number of passes,
and then show that in practice a constant number of passes suffices. 

The
“data mining extensions” (DMX) 2 is a SQL-like language for coding
data-mining models in the Microsoft platform, and therefore it is difficult to
gain understanding of the data-mining domain. Data mining is a highly complex
task which requires a great effort in preprocessing data under analysis, e.g.,
data exploration, cleansing, and integration 9. The 10 provides an entire
framework to carry out data mining but, once again, they are situated at very
low-abstraction level, since they are code-oriented and they do not contribute
to facilitate understanding of the domain problem. Research papers 11 and
12 provide a modeling framework to de ne data-mining techniques at a high-abstraction
level by using UML. However, these UML-based models are mainly used as
documentation. Parsaye 13 examined the relationship between OLAP and data
mining and proposed an architecture integrating OLAP and data mining and
discussed the need for different levels of aggregation for data mining.

x

Hi!
I'm Clifton!

Would you like to get a custom essay? How about receiving a customized one?

Check it out