1. 程式人生 > 其它 >【形式化方法】PartB:Linear Regression(線性迴歸)

【形式化方法】PartB:Linear Regression(線性迴歸)

技術標籤:科軟課程資料關注公眾號【小柒很愛喵】形式化方法

線性迴歸

在統計學中,線性迴歸是對標量響應和一個或多個解釋變數(也稱為因變數和自變數)之間關係建模的一種線性方法。

近年來,線性線性迴歸在人工智慧和機器學習領域發揮了重要作用。線性迴歸演算法因其相對簡單和眾所周知的特性而成為監督機器學習的基本演算法之一。感興趣的讀者可以參考深度學習方面的資料,例如Andrew Ng從深度學習的角度很好地介紹了線性迴歸(到第7頁)。

但是,由於這不是一門深度學習的課程,所以我們將從數學的角度來解決這個問題。我們從學習一個具體的例子開始,給出以下資料(在深度學習的術語中,這些資料被稱為訓練資料):

我們的目標是生成一個線性函式:

使其擬合上述資料儘可能接近,其中變數和為未知變數。通過“儘可能接近”,我們使用最小二乘法,即我們要最小化以下表達式:

N 是X 或Y 的長度

現在下一步是解方程(6)來計算變數的值和。我們將使用Z3來完成這個任務,因為Z3也支援一些非線性約束求解。

Exercise 18:閱讀linear_regression.py Python檔案中的程式碼,您需要安裝matplotlib包來執行這段程式碼。你可以通過pip安裝matplotlib:
pip install matplotlib

或者,你可以通過PyChram的偏好安裝它們,就像我們在作業1的軟體設定中所做的那樣。在設定環境之後,完成lr_training()方法,使用Z3進行線性迴歸。

import matplotlib.pyplot as plt
from z3 import *

from linear_regression_ml import sklearn_lr


class Todo(Exception):
    def __init__(self, msg):
        self.msg = msg

    def __str__(self):
        return self.msg

    def __repr__(self):
        return self.__str__()


################################################
# Linear Regression (from the SMT point of view)

# In statistics, linear regression is a linear approach to modelling
# the relationship between a scalar response and one or more explanatory
# variables (also known as dependent and independent variables).
# The case of one explanatory variable is called simple linear regression;
# for more than one, the process is called multiple linear regression.
# This term is distinct from multivariate linear regression, where multiple
# correlated dependent variables are predicted, rather than a single scalar variable.

# In recent years, linear Linear regression plays an important role in the
# field of artificial intelligence such as machine learning. The linear
# regression algorithm is one of the fundamental supervised machine-learning
# algorithms due to its relative simplicity and well-known properties.
# Interested readers can refer to the materials on deep learning,
# for instance, Andrew Ng gives a good introduction to linear regression
# from a deep learning point of view.

# However, as this is not a deep learning course, so we'll concentrate
# on the mathematical facet. And you should learn the background
# knowledge on linear regression by yourself.

# We start by studying one concrete example, given the following data
# (in machine learning terminology, these are called the training data):
xs = [1.0, 2.0, 3.0, 4.0]
ys = [1.0, 3.0, 5.0, 7.0]

# our goal is to produce a linear function:
#   y = k*x + b
# such that it fits the above data as close as possible, where
# the variables "k" and "b" are unknown variables.
# By "as close as possible", we use a least square method, that is, we
# want to minimize the following expression:
#   min(\sum_i (ys[i] - (k*xs[i]+b)^2)   (1)

# Now the next step is to solve the equation (1) to calculate the values
# for the variables "k" and "b".
# The popular approach used extensively in deep learning is the
# gradient decedent algorithm, if you're interested in this algorithm,
# here is a good introduction from Andrew Ng (up to page 7):
#   https://see.stanford.edu/materials/aimlcs229/cs229-notes1.pdf

# In the following, we'll discuss how to solve this problem using
# SMT technique from this course.

# Both "draw_points()" and "draw_line()" are drawing utility functions to
# draw points and straight line.
# You don't need to understand these code, and you can skip
# these two functions safely. If you are really interested,
# please refer to the manuals of matplotlib library.


# Input: xs and ys are the given data for the coordinates
# Output: draw these points [xs, ys], no explicit return values.
def draw_points(xs, ys):
    plt.scatter(xs, ys, marker='x', color='red', s=40, label='Data')
    plt.legend(loc='best')
    plt.xlim(0, 8)  # 設定繪圖範圍
    plt.ylim(0, 8)
    plt.savefig("./points.png")
    plt.show()


# Input: a group of coordinates [xs, ys]
#        k and b are coefficients
# Output: draw the coordinates [xs, ys], draw the line y=k*x+b
#       no explicit return values
def draw_line(k, b, xs, ys):
    new_ys = [(k*xs[i]+b) for i in range(len(xs))]
    plt.scatter(xs, ys, marker='x', color='red', s=40, label='Data')
    plt.plot(xs, new_ys)
    plt.legend(loc='best')
    plt.xlim(0, 8)  # 設定繪圖範圍
    plt.ylim(0, 8)
    plt.savefig("./line.png")
    plt.show()


# Arguments: xs, ys, the given data for these coordinates
# Return:
#   1. the solver checking result "res";
#   2. the k, if any;
#   3. the b, if any.
def lr_training(xs, ys):
    # create two coefficients
    k, b = Ints('k b')

    # exercise 18: Use a least squares method
    # (https://en.wikipedia.org/wiki/Least_squares)
    # to generate the target expression which will be minimized
    # Your code here:
    # raise Todo("exercise 18: please fill in the missing code.")
    exps = []
    i = 0
    for x in xs:
        exps.append((ys[i] - k*x - b)*(ys[i] - k*x - b))
        i = i+1
    # print(exps)
    # double check the expression is right,
    # it should output:
    #
    # 0 +
    # (1 - k*1 - b)*(1 - k*1 - b) +
    # (3 - k*2 - b)*(3 - k*2 - b) +
    # (5 - k*3 - b)*(5 - k*3 - b) +
    # (7 - k*4 - b)*(7 - k*4 - b)
    #
    print("the target expression is: ")
    print(sum(exps))

    # create a solver
    solver = Optimize()

    # add some constraints into the solver, these are the feasible values
    solver.add([k < 100, k > 0, b > -10, b < 10])

    # tell the solver which expression to check
    solver.minimize(sum(exps))

    # kick the solver to perform checking
    res = solver.check()

    # return the result, if any
    if res == sat:
        model = solver.model()
        kv = model[k]
        bv = model[b]
        return res, kv.as_long(), bv.as_long()
    else:
        return res, None, None


if __name__ == '__main__':
    draw_points(xs, ys)
    res, k, b = lr_training(xs, ys)
    if res == sat:
        print(f"the linear function is:\n y = {k}*x {'+' if b >= 0 else '-'} {abs(b)}")
        draw_line(k, b, xs, ys)
    else:
        print('\033[91m Training failed! \033[0m')

    k, b = sklearn_lr(xs, ys)
    print(f"the linear function is:\n y = {k}*x {'+' if b >= 0 else '-'} {abs(b)}")

    # exercise 19: Compare the machine learning approach and the LP approach
    # by trying some different training data. Do the two algorithms produce the same
    # results? What conclusion you can draw from the result?
    # Your code here:


輸出結果:

深度學習中廣泛使用的流行方法是gradient decedent演算法,如果你對這個演算法感興趣,上面Andrew Ng的筆記中包含了很好的介紹。在大多數情況下,你不需要重新發明輪子,Python中有很多有效的機器學習庫,比如scikit-learn。您可以直接使用它們來完成任務。

Exercise 19:在linear_regression_ml.py Python檔案中,我們提供了一個基於scikit-learn深度學習庫的線性迴歸實現。你不需要寫任何程式碼,但你需要通過pip安裝numpy和scikit-learn包:
pip install numpy
pip install scikit-learn
或者,你可以通過PyChram的偏好安裝它們,就像我們在作業1的軟體設定中所做的那樣。設定環境之後,您需要比較機器學習方法和LP方法(您在練習17中通過嘗試一些不同的訓練資料實現的方法)。這兩種演算法產生相同的結果嗎?從結果中你能得出什麼結論?

此題有疑問、下週更。= =

#中科大軟院-hbj形式化課程筆記-歡迎留言與私信交流

#隨手點贊,我會更開心~~^_^