Saturday, February 4, 2017

Notes: Machine Learning

30 minutes of continuous information is hard to absorb... So, to summarize "A Friendly Introduction to Machine Learning", here are the definitions:
Feel free to read these as you watch the video, they go in order as they are explained in the video and may help define what is going on as it is being visually represented.

  • Machine Learning: The ability of a computer to learn like a human, from experience. For computers, however, this experience, is substituted by data that it can analyze, store, and compare. 
    • Linear Regression: Comparison of data on a coordinate plane; Data points are marked on a graph and a line that best fits the data is derived, so that it has the least amount of error. The error is measured by the distance of all the points from the line: The greater the sum of the distances, the greater the error; This method also works with polynomials, parabolas, etc. 
      • Gradient Descent: an algorithm used to minimize functions; A continuous procedure that adjusts a function, in order to provide the most optimal result or decrease the error to find its minimum value; Uses calculus!
    • Naive Bayes: Providing solutions based on probability; Probable characteristics for a desired output are evaluated. The inputs with the most of these characteristics are the first to be considered.
    • Decision Trees: Comparison of data based on a table; Using multiple features to split the data continuously to narrow it down to individual users/outputs. (You can think of it as a series of if statements that keep splitting the [remaining] data into two categories for x features or characteristics)
    • Logistic Regression: Comparison of data on a coordinate grid; Looking for previous trends to divide data. Like Linear Regression, a line is used to divide the points. Except, the error is measured by the number of points that are wrongly classified by the division of the graph with the line, based on a given condition (ex.: pass or fail).
      • Gradient Descent is, again, used to minimize the error.
    • Neural Networks: Comparison of data on a coordinate grid; Using multiple lines to split the data based on multiple conditions. Like an and-statement, the data that fits all conditions is our output; A combination of multiple logistic regression graphs.
    • Support Vector Machines: Finding a line that maximizes the distance from boundary points through linear optimization; Separating the data into two sections and split it evenly between them. (Think of this kind of like finding the middle of a middle).
      • Kernel Trick: Helps support Vector Machines; Finding a function that splits the data based on certain set mathematical properties and similarity between the data points; As a mapping function, it can often be perceived as working in a 3D space. The additional dimension serves as a way to split the data accordingly. 
    • Clustering: Grouping data points based on proximity until a distance limit is met; Essentially, this means grouping data points with similar values together to classify the data. By determining the limit between the differences of the values, one can manipulate how many or how large the groups are. (hierarchical or K-means)

1 comment: