**file mining**rather than

**data- base mining**– kuna me anname tihti mudeli tegemiseks ette denormaliseeritud flat data.

**classification learning**, the learning scheme is presented with a set of classified examples from which it is expected to learn a way of

**classifying unseen examples.**

**Classification learning**is sometimes called

**supervised**

**association learning**, any

**association among features**is sought, not just ones that predict a particular class value .

**Association**Rules – Some rules imply others. To reduce the number of rules that are produced, in cases where several rules are related it makes sense to present only the strongest one to the user

When there is no specified class, clustering is used to group items that seem to fall naturally together.

**clustering**,

**groups of examples that belong together**are sought.

**numeric prediction**, the

**outcome**to be predicted is not a discrete class but a

**numeric quantity.**

**Input**

**instances**. These instances are the things that are to be classified or associated or clustered

**instance**that provides the input to machine learning is characterized by its values on a fixed, predefined set of

**features**or

**attributes**The

**value**of an

**attribute**for a particular

**instance**is a measurement of the quantity to which the

**attribute**refers

**quantities**that are

**numeric**and ones that are

**nominal**. Numeric attributes, sometimes called

**continuous**attributes, measure numbers—either real or integer valued.

**Nominal**attributes are sometimes called

**categorical**,

**enumerated**, or

**discrete**– näiteks peretüüp, silmade värv, sugu.

**flattening**that is techni- cally called

**denormalization**

**Output**

**Decision tree**

**instance-based**classification, each new instance is compared with existing ones using a distance metric, and the closest existing instance is used to assign the class to the new one. This is called the

**nearest-neighbor**classification method. Sometimes more than one nearest neigh- bor is used, and the majority class of the closest k neighbors (or the distance- weighted average if the class is numeric) is assigned to the new instance. This is termed the

**k-nearest-neighbor**method

**attribute weights**from the training set is a key problem in instance-based learning

**Methods**

**entropy**(2/5,3/5) = -2/5

`*`log(2/5) – 3/5

`*`log(3/5) = 0.971 bits

`*`log(1) – 0

`*`log(0) = 0 bits

`*`log(3/5) – 2/5

`*`log(2/5) = 0.971 bits

`*`0.971 + 4/14

`*`0 + 5/14

`*`0.971 = 0.693

**MINING ASSOCIATION RULES**

**Outlook=Sunny, Temperature=hot**

**Outlook= Sunny, Temperature=hot, Humidity=high**

**Outlook=Sunny, Temperature=hot, Humidity=high, Play=no**

**Outlook=Sunny, Temperature=hot, Humidity=high, Play=no**) leiame kahel korral.

**Genereerime reeglid**

**humidity = normal, windy = false, play = yes**If humidity = normal and windy = false then play = yes 4/4 (iga paar annab play = yes / neli paari humidity = normal and windy = false e 4/4 –

**coverage**) 4/4 = 1 ehk 100%

**accuracy**If humidity = normal and play = yes then windy = false 4/6 ( 4 kord kuuest tingimusest humidity = normal and play = yes on tõene windy = false) 4/6 –

**coverage, (0.66) 70%**If windy = false and play = yes then humidity = normal 4/6 If humidity = normal then windy = false and play = yes 4/7 If windy = false then humidity = normal and play = yes 4/8 If play = yes then humidity = normal and windy = false 4/9 If – then humidity = normal and windy = false and play = yes 4/14 – siin näiteks on tingimus humidity = normal and windy = false, mis ilmneb 14 korral ainult 4 korral tõene. Kui me nüüd seame tingimuseks et minimaalne coverage on 2 ja minimaalne accuracy = 100% siis saame 58 reeglit. Mõned neist allpoolt toodud tabelis … Weka solution to generate rules

**accuracy****Numeric Prediction: Linear Regression (supervised)**Kui andmed on numbrilised/skalaarid, siis

**linear regression**on üks meetoditest, mida kaaluda.

**bias**) w0, w1,…., wk – weights (are calculated from the training data)

antud valem ei anna mitte klassi väärtust vaid ennustatud klassi. Tuleb hakata võrdlema viga, mis on reaalse klassi ja ennustatud klassi vahel Erinevaid online linear regression tööriistasid on olemas (http://www.wessa.net/slr.wasp) Üks näide X, Y 1,1 1,2 2,1 2,3 2,4 4,3 4,4 wolfram alfa

**Clustering (unsupervised)**

**k-means**

**Support vector machines**select a small number of critical boundary instances called support vectors from each class and build a

**linear discriminant**function that separates them as widely as possible. This instance-based approach transcends the limitations of linear boundaries by making it practical to include extra

**nonlinear**terms in the function, making it possible to form

**quadratic**,

**cubic**, and

**higher-order decision boundaries**

The function (x • y)^n, which computes the dot product of two vectors x and y and raises the result to the power n, is called a **polynomial kernel**.

Other kernel functions can be used instead to implement different **nonlinear** mappings. Two that are often suggested are the** radial basis function (RBF)** kernel and the **sigmoid kernel**. Which one produces the best results depends on the applica- tion, although the differences are rarely large in practice. It is interesting to note that a support vector machine with the RBF kernel is simply a type of neural network called an RBF network (which we describe later), and one with the sigmoid kernel implements another type of neural network, a multilayer perceptron with one hidden layer.

Mathematically, any function K(x, y) is a kernel function if it can be written as K(x, y) = ?(x) • ?(y), where ? is a function that maps an instance into a (potentially high-dimensional) feature space. In other words, the kernel function represents a dot product in the feature space created by ?

Just one k-mean picture