Pandas學習2 --- 資料型別Series、DataFrame

阿新 • • 發佈：2018-11-07

Pandas的資料型別

Series(一維資料結構)

Dataframe

Series --- 帶標籤的一維陣列

常用的初始化方法：

可迭代物件
np陣列
字典物件
標量

一、Series

1. Series初始化

匯入

import pandas as pd
import numpy as np

s = pd.Series([1, 2, 3])

type(s)

pandas.core.series.Series

0    1
1    2
2    3
dtype: int64

通過可迭代物件建立Series

 pd.Series(range(5))

0    0
1    1
2    2
3    3
4    4
dtype: int64

通過numpy陣列建立Series

t = np.random.randint(5, 15, size=(8))
pd.Series(t)

0    11
1     9
2     6
3     7
4    11
5    12
6     6
7    14
dtype: int64

通過標量建立

pd.Series(100, index=['a', 5, b'sd'])

a        100
5        100
b'sd'    100
dtype: int64

使用字典自帶索引

pd.Series({100:165, 'asdf':961})

100     165
asdf    961
dtype: int64

2. Series資料屬性

2.1 索引

獲得索引

s = pd.Series([7,8,9], index=[1,2,3])
s.index

Int64Index([1, 2, 3], dtype='int64')

s.index = ['a', 'b', 'c']

a    7
b    8
c    9
dtype: int64

可以手動建立Index物件（數量必須匹配）

index = pd.Index(['aaaa', 'bbbb', 'cccc'])
pd.Series([7,8,9], index=index)

aaaa    7
bbbb    8
cccc    9
dtype: int64

2.2 值

返回資料

s.values

array([7, 8, 9])

a    7
b    8
c    9
dtype: int64

2.3 尺寸

s.size

s.dtype

dtype('int64')

2.4 其他

Series可以指定name

index = pd.Index(['a', 'b', 'c'], name = 'Index名字')

s = pd.Series([1,2,3], index=[1,2,3], name='"Series名字"')
s

1    1
2    2
3    3
Name: "Series名字", dtype: int64

索引可以指定name屬性

s.index = index
s

My_Index
a    1
b    2
c    3
Name: "Series名字", dtype: int64

head 和 tail ,預設(n=5)

s.head(2)

1    1
2    2
Name: "Series名字", dtype: int64

s.tail(100)

1    1
2    2
3    3
Name: "Series名字", dtype: int64

test_np = np.random.randint(0, 15, size = 10)
test_np

array([3, 5, 9, 6, 1, 8, 9, 9, 2, 1])

test_pd = pd.Series(test_np)
test_pd

0     6
1    11
2     4
3     3
4     4
5     9
6     4
7     7
8    11
9     5
dtype: int64

test_np[5]

test_pd[5]

test_np[5] == test_pd[5]

True

3. Series運算

test_pd

0     7
1    12
2     5
3     4
4     5
5    10
6     5
7     8
8    12
9     6
dtype: int64

test_pd + 1

0     8
1    13
2     6
3     5
4     6
5    11
6     6
7     9
8    13
9     7
dtype: int64

test_pd + test_pd

0    14
1    24
2    10
3     8
4    10
5    20
6    10
7    16
8    24
9    12
dtype: int64

Series按照 index 計算，缺失則返回結果NaN（not a number）

s1 = pd.Series([1,2,3], index=[1,2,3])
s2 = pd.Series([1,2,3], index=[2,3,4])
s1 + s2

1    NaN
2    3.0
3    5.0
4    NaN
dtype: float64

使用函式方式運算，如果需要處理不匹配值，那麼對Series物件填充索引，指定填充值，並進行運算

s1.add(s2, fill_value=100000)

1    100001.0
2         3.0
3         5.0
4    100003.0
dtype: float64

幾個特殊浮點數, 以及空值的判斷

s = pd.Series([1, 2, 3, float('NaN'), np.NaN])

s.isnull()

0    False
1    False
2    False
3     True
4     True
dtype: bool

nd 和 pd 在計算時對空值的的處理不同

numpy會產生空值（）
pandas忽略空值

t = np.array([1, 2, 3, float('NaN'), np.NaN])
t.sum()

nan

s.sum()

6.0

4. 提取元素

通過索引提取元素
通過標籤陣列和布林陣列提取元素(推薦)

a = np.array([1, 2, 3])
b = pd.Series(a, index = [0,1,2])

index1 = [0, 1, 2]
index2 = [False, True, True]

b[index1]

0    1
1    2
2    3
dtype: int64

b[index2]

1    2
2    3
dtype: int64

注意：

訪問可以使用標籤索引，也可以使用位置索引
建立時指定的標籤，稱為標籤索引，如果標籤索引是數值型別，替換原先預設的位置索引（位置索引失效）

5. 標籤索引（loc）和位置索引（iloc） --- 避免索引混淆

b.loc[0]

b.iloc[1]

# test_np = np.random.randint(0, 15, size = 10)
test_np = np.arange(15)
test_pd = pd.Series(test_np)

test_pd.loc[4:8].values # 標籤索引會前閉後閉

array([4, 5, 6, 7, 8])

test_pd.iloc[4:8].values # 標籤索引會前閉後開

array([4, 5, 6, 7])

test_np[4:8] # np索引前閉後開

array([4, 5, 6, 7])

test_np[4:80]

array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

6. 數值操作

獲取值
修改值
增加索引-值
刪除索引-值

s = pd.Series([1, 2, 3, float('NaN'), np.NaN])
s.loc['a'] = 'a'
s

0      1
1      2
2      3
3    NaN
4    NaN
a      a
dtype: object

s.drop('a') # 建立新的刪除物件

0      1
1      2
2      3
3    NaN
4    NaN
dtype: object

0      1
1      2
2      3
3    NaN
4    NaN
a      a
dtype: object

s.drop(['a', 3], inplace=True) # 可以這樣子刪除,所有的inplace引數都預設為False,即返回新物件
s

0      1
1      2
2      3
4    NaN
dtype: object

9. 其他

unique --- 去重，但是不排序
value_counts --- 計數

s = pd.Series([1, 10, -2, -5, 20, 10, -5])
s.unique()

array([ 1, 10, -2, -5, 20])

s.value_counts(ascending=True)

 1     1
-2     1
 20    1
 10    2
-5     2
dtype: int64

二、DataFrame型別

1. DataFrame建立

多維資料型別，常用在二維情況，包含行標籤和列標籤。二維DaraFrame的建立方式如下：

二維陣列結構（列表，ndarray,DataFrame等）
字典型別，key為列標籤，value為一維資料結構

df1 = pd.DataFrame([[11, 21, 31], [99, 88, 77]])
df2 = pd.DataFrame([[11, 21, 31, 41], [99, 88, 77, 66]])

df1

	0	1	2
0	11	21	31
1	99	88	77

print(df1)

    0   1   2
0  11  21  31
1  99  88  77

IPython的擴充套件內建函式display() 可以把多個數據美化呈現方式

display(df1)
display(df2)

	0	1	2
0	11	21	31
1	99	88	77

	0	1	2	3
0	11	21	31	41
1	99	88	77	66

DataFrame使用的是列向量，因此通過字典建立，Key是列表簽名

di = {
    "名字":['a', 'b', 'c', 'd'],
    '年齡':[32, 23, 45, 76],
    '班級':8,
    '成績':np.random.randint(0,100,4)
}
df = pd.DataFrame(di)
df

	名字	年齡	成績	班級
0	a	32	31	8
1	b	23	46	8
2	c	45	95	8
3	d	76	67	8

index是行標籤， columns是列標籤

# df.index = ['張三', '李四'， '王五', '李六']
df.columns = ["學生名字", '學生年齡', '學生成績', '學生班級']
df

	學生名字	學生年齡	學生成績	學生班級
0	a	32	31	8
1	b	23	46	8
2	c	45	95	8
3	d	76	67	8

2. 抽取資料——抽樣

從頭和尾取資料

df.head(n=2)

	名字	年齡	成績	班級
0	a	32	51	8
1	b	23	53	8

df.tail(n=2)

	名字	年齡	成績	班級
2	c	45	63	8
3	d	76	79	8

隨機取樣

df.sample(n=2, frac=None, replace=False, weights=None, random_state=None, axis=None) # 預設不放回抽樣，抽取1個

	名字	年齡	成績	班級
0	a	32	51	8
3	d	76	79	8

df.sample(n=10, replace=True, random_state=456) # random_state是隨機數種子； replace=True 是放回抽樣

	名字	年齡	成績	班級
3	d	76	79	8
1	b	23	53	8
3	d	76	79	8
1	b	23	53	8
2	c	45	63	8
3	d	76	79	8
0	a	32	51	8
2	c	45	63	8
3	d	76	79	8
2	c	45	63	8

df.sample(n=10, replace=True, random_state=456) # random_state是隨機數種子； replace=True 是放回抽樣

	名字	年齡	成績	班級
3	d	76	79	8
1	b	23	53	8
3	d	76	79	8
1	b	23	53	8
2	c	45	63	8
3	d	76	79	8
0	a	32	51	8
2	c	45	63	8
3	d	76	79	8
2	c	45	63	8

3. DataFrame屬性

index --- 行索引
columns --- 列索引
values --- 資料，二維ndarray資料
shape --- 形狀
ndim --- 維數
dtypes --- 資料型別(ndarray是一個能儲存的所有元素的唯一型別，DataFrame每一列一個型別)

df.index

RangeIndex(start=0, stop=4, step=1)

df.columns

Index(['名字', '年齡', '成績', '班級'], dtype='object')

df.values

array([['a', 32, 51, 8],
       ['b', 23, 53, 8],
       ['c', 45, 63, 8],
       ['d', 76, 79, 8]], dtype=object)

df.shape

(4, 4)

df.ndim

返回的資料為Series型別

df.dtypes

名字    object
年齡     int64
成績     int64
班級     int64
dtype: object

4. 行、列操作：

可以通過index和columns提出特定資料
可以為index和columns建立name
直接中括號索引獲取列，loc和iloc獲取行（Series）

4.1 獲取行、列

di = {
    "名字":['a', 'b', 'c', 'd'],
    '年齡':[32, 23, 45, 76],
    '班級':8,
    '成績':np.random.randint(0,100,4)
}
df = pd.DataFrame(di)
df

	名字	年齡	成績	班級
0	a	32	68	8
1	b	23	33	8
2	c	45	35	8
3	d	76	11	8

獲取一個數據

df.loc[0, '名字']

'a'

獲取一行

df[ '名字']

0    a
1    b
2    c
3    d
Name: 名字, dtype: object

df [ ] 訪問多列

df[['名字', '年齡']]

	名字	年齡
0	a	32
1	b	23
2	c	45
3	d	76

獲取一行，每一行是一個Series型別

df.loc[1]

名字     b
年齡    23
成績    33
班級     8
Name: 1, dtype: object

訪問多行

df.loc[[1,2,3]]

	名字	年齡	成績	班級
1	b	23	33	8
2	c	45	35	8
3	d	76	11	8

4.2 增加行、列

4.2.1 獲取某一列，Series的索引name為DataFrame列標籤名字

df['@'] = [1,2,3,4]
df

	名字	年齡	成績	班級	@
0	a	32	68	8	1
1	b	23	33	8	2
2	c	45	35	8	3
3	d	76	11	8	4

4.2.2 新增一列

df['@']

0    1
1    2
2    3
3    4
Name: @, dtype: int64

df.index

RangeIndex(start=0, stop=4, step=1)

df.index.name = '行索引名'
df.columns.name = '列索引名'
df

列索引名	名字	年齡	成績	班級	@
行索引名
0	a	32	68	8	1
1	b	23	33	8	2
2	c	45	35	8	3
3	d	76	11	8	4

新增一個求和列

df1 = pd.DataFrame({
    '蘋果':[1,2,3],
    '香蕉':[4,5,6],
    '葡萄':[7,8,9],
})
df1['總和'] = df1['蘋果'] + df1['香蕉'] + df1['葡萄']
df1

	蘋果	葡萄	香蕉	總和
0	1	7	4	12
1	2	8	5	15
2	3	9	6	18

4.2.3 刪除列

df.pop('@')

行索引名
0    1
1    2
2    3
3    4
Name: @, dtype: int64

4.2.4 獲取行

df.drop([1,2], axis='index', inplace=False) #返回新物件，不inplace修改

列索引名	學生名字	學生年齡	學生成績	學生班級
行索引名
0	a	32	31	8
3	d	76	67	8

df.loc[[2, 3]] # 推薦使用標籤名稱獲取物件

列索引名	學生名字	學生年齡	學生成績	學生班級
行索引名
2	c	45	95	8
3	d	76	67	8

df.iloc[[2, 3]] # 不推薦使用

列索引名	學生名字	學生年齡	學生成績	學生班級
行索引名
2	c	45	95	8
3	d	76	67	8

4.2.5 增加一行,需要新增的Series資料必須含有name(對應行標籤)

di = {
    "名字":['a', 'b', 'c', 'd'],
    '年齡':[32, 23, 45, 76],
    '班級':8,
    '成績':np.random.randint(0,100,4)
}
df = pd.DataFrame(di)
row = pd.Series([ 's', 45, 65, 8], name='new', index=['名字', '年齡', '成績', '班級'])
df.append(row )

	名字	年齡	成績	班級
0	a	32	87	8
1	b	23	74	8
2	c	45	36	8
3	d	76	13	8
new	s	45	65	8

di = {
    "名字":['a', 'b', 'c', 'd'],
    '年齡':[32, 23, 45, 76],
    '班級':8,
    '成績':np.random.randint(0,100,4)
}
dff = pd.DataFrame(di)
row = pd.Series([ 's', 45, 65, 8], name='new', index=['名字', '年齡', '成績', '班級'])
dff.append(row, ignore_index=True)

	名字	年齡	成績	班級
0	a	32	6	8
1	b	23	8	8
2	c	45	93	8
3	d	76	73	8
4	s	45	65	8

在增加多行的時候，優先使用concat,效能更好。

pd.concat((df, dff), axis=0, ignore_index=False)

	名字	年齡	成績	班級
0	a	32	51	8
1	b	23	53	8
2	c	45	63	8
3	d	76	79	8
0	a	32	6	8
1	b	23	8	8
2	c	45	93	8
3	d	76	73	8

result = pd.concat((df, dff), axis=1, ignore_index=False)
result

	名字	年齡	成績	班級	名字	年齡	成績	班級
0	a	32	51	8	a	32	6	8
1	b	23	53	8	b	23	8	8
2	c	45	63	8	c	45	93	8
3	d	76	79	8	d	76	73	8

4.2.6 刪除l列、行

result.drop(['名字', '班級'], axis=1)

	年齡	成績	年齡	成績
0	32	51	32	6
1	23	53	23	8
2	45	63	45	93
3	76	79	76	73

4.2.7 混合操作

可以先獲取行，也可以現貨區列

drop方法可以刪除行和列；
df[索引]針對列操作,不支援位置索引，只支援列標籤；
df.loc[索引]、df.iloc[索引]針對行操作；
df[切片]不推薦【對行操作，既支援位置索引，也支援標籤索引; 此外，和第二條衝突，切片索引變成了行操作不利於記憶】
df[[列表]] 也存在歧義，如果是【標籤陣列- 列操作】【布林陣列- 行操作】

df = pd.DataFrame({
    '蘋果':[1,2,3],
    '香蕉':[4,5,6],
    '葡萄':[7,8,9],
})
df['總和'] = df['蘋果'] + df['香蕉'] + df['葡萄']
df

	蘋果	葡萄	香蕉	總和
0	1	7	4	12
1	2	8	5	15
2	3	9	6	18

df['蘋果'].loc([0])

<pandas.core.indexing._LocIndexer at 0x7fd51e4b37f0>

df[['蘋果', '葡萄']].loc[[0,2]]

	蘋果	葡萄
0	1	7
2	3	9

切片訪問行

df.iloc[0:1]

	蘋果	葡萄	香蕉	總和
0	1	7	4	12

df.iloc[0]

蘋果     1
葡萄     7
香蕉     4
總和    12
Name: 0, dtype: int64

4.2.8 標籤名，name屬性的轉換

如果拿出列資料，
如果拿出行資料，

df = pd.DataFrame({
    '蘋果':[1,2,3],
    '香蕉':[4,5,6],
    '葡萄':[7,8,9],
})
df['總和'] = df['蘋果'] + df['香蕉'] + df['葡萄']
df

	蘋果	葡萄	香蕉	總和
0	1	7	4	12
1	2	8	5	15
2	3	9	6	18

df.loc[0]

蘋果     1
葡萄     7
香蕉     4
總和    12
Name: 0, dtype: int64

df[[True, False, False]]

	蘋果	葡萄	香蕉	總和
0	1	7	4	12

df['蘋果']

0    1
1    2
2    3
Name: 蘋果, dtype: int64

5. 計算

df1 = pd.DataFrame(np.arange(24).reshape(4,6))
df2 = pd.DataFrame(np.arange(100, 124).reshape(4,6))

轉置

df1.T

	0	1	2	3
0	0	6	12	18
1	1	7	13	19
2	2	8	14	20
3	3	9	15	21
4	4	10	16	22
5	5	11	17	23

加法

df1 + df2

	0	1	2	3	4	5
0	100	102	104	106	108	110
1	112	114	116	118	120	122
2	124	126	128	130	132	134
3	136	138	140	142	144	146

加法對不齊，產生NaN

df2.index = [0, 1, 3, 4]
df2.columns = [0, 1, 2, 3, 4, 6]
df1 + df2

	0	1	2	3	4	5	6
0	100.0	102.0	104.0	106.0	108.0	NaN	NaN
1	112.0	114.0	116.0	118.0	120.0	NaN	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	130.0	132.0	134.0	136.0	138.0	NaN	NaN
4	NaN	NaN	NaN	NaN	NaN	NaN	NaN

df1.add(df2, fill_value=0)

	0	1	2	3	4	5	6
0	100.0	102.0	104.0	106.0	108.0	5.0	105.0
1	112.0	114.0	116.0	118.0	120.0	11.0	111.0
2	12.0	13.0	14.0	15.0	16.0	17.0	NaN
3	130.0	132.0	134.0	136.0	138.0	23.0	117.0
4	118.0	119.0	120.0	121.0	122.0	NaN	123.0

DaraFram 和 Series 加法

--- 行和列操作都可以操作

s = pd.Series([100, 200, 300, 400, 500], index = np.arange(5))

df1

	0	1	2	3	4	5
0	0	1	2	3	4	5
1	6	7	8	9	10	11
2	12	13	14	15	16	17
3	18	19	20	21	22	23

預設列對齊

df1 + s

	0	1	2	3	4	5
0	100.0	201.0	302.0	403.0	504.0	NaN
1	106.0	207.0	308.0	409.0	510.0	NaN
2	112.0	213.0	314.0	415.0	516.0	NaN
3	118.0	219.0	320.0	421.0	522.0	NaN

也可以行操作

df1.add(s,  axis='index')

	0	1	2	3	4	5
0	100.0	101.0	102.0	103.0	104.0	105.0
1	206.0	207.0	208.0	209.0	210.0	211.0
2	312.0	313.0	314.0	315.0	316.0	317.0
3	418.0	419.0	420.0	421.0	422.0	423.0
4	NaN	NaN	NaN	NaN	NaN	NaN

7. 排序

索引排序
值排序

df = pd.DataFrame(np.arange(24).reshape(4,6), index=[5, 6, 2, 4], columns=[6,1,7,3, 4,2])
df

	6	1	7	3	4	2
5	0	1	2	3	4	5
6	6	7	8	9	10	11
2	12	13	14	15	16	17
4	18	19	20	21	22	23

df.sort_index(axis=1, ascending=False) # 列操作，降序操作

	7	6	4	3	2	1
5	2	0	4	3	5	1
6	8	6	10	9	11	7
2	14	12	16	15	17	13
4	20	18	22	21	23	19

df.sort_index(axis=0, ascending=False)

	6	1	7	3	4	2
6	6	7	8	9	10	11
5	0	1	2	3	4	5
4	18	19	20	21	22	23
2	12	13	14	15	16	17

df.sort_values(5, axis=1, ascending=False, inplace=False) # 行操作，降序操作 ************易混淆*****************

	2	4	3	7	1	6
5	5	4	3	2	1	0
6	11	10	9	8	7	6
2	17	16	15	14	13	12
4	23	22	21	20	19	18

8. 統計方法

mean / sum / count / median
max / min
cumsum / cumprod
argmax / argmin (所在索引, 老式不推薦)
idxmax / idxmin (所在索引，推薦)
var / std (標準差，方差)
corr / cov (相關係數，協方差)

df = pd.DataFrame(np.arange(24).reshape(4,6))
df

	0	1	2	3	4	5
0	0	1	2	3	4	5
1	6	7	8	9	10	11
2	12	13	14	15	16	17
3	18	19	20	21	22	23

df.mean(axis='columns')

0     2.5
1     8.5
2    14.5
3    20.5
dtype: float64

df.idxmax()

0    3
1    3
2    3
3    3
4    3
5    3
dtype: int64

df.var

<bound method DataFrame.var of     0   1   2   3   4   5
0   0   1   2   3   4   5
1   6   7   8   9  10  11
2  12  13  14  15  16  17
3  18  19  20  21  22  23>

 df.std

<bound method DataFrame.std of     0   1   2   3   4   5
0   0   1   2   3   4   5
1   6   7   8   9  10  11
2  12  13  14  15  16  17
3  18  19  20  21  22  23>

df.corr()

	0	1	2	3	4	5
0	1.0	1.0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0	1.0	1.0
3	1.0	1.0	1.0	1.0	1.0	1.0
4	1.0	1.0	1.0	1.0	1.0	1.0
5	1.0	1.0	1.0	1.0	1.0	1.0

cov(X,Y) = E( [X - E(X)] [Y - E(Y)] )

df.cov()

	0	1	2	3	4	5
0	60.0	60.0	60.0	60.0	60.0	60.0
1	60.0	60.0	60.0	60.0	60.0	60.0
2	60.0	60.0	60.0	60.0	60.0	60.0
3	60.0	60.0	60.0	60.0	60.0	60.0
4	60.0	60.0	60.0	60.0	60.0	60.0
5	60.0	60.0	60.0	60.0	60.0	60.0

import matplotlib.pyplot as plt
%matplotlib inline

img = np.ones((500,500,3))
plt.imshow(img[:,:,0], cmap="gray")

<matplotlib.image.AxesImage at 0x7fd51a5e18d0>

png

Pandas學習2 --- 資料型別Series、DataFrame

Pandas的資料型別 Series(一維資料結構) Dataframe Series --- 帶標籤的一維陣列常用的初始化方法：可迭代物件 np陣列字典物件標量一、Series 1. Series初始化匯入 import pan

Python學習2——資料型別

整型 num01 = 100 num01 = 100 #十進位制 num02 = 0x6F #十六進位制 num03 = 0o41 #八進位制 print (num01) print (num02) print (num03) 打印出來的結果都是十進位制

python學習筆記——（2）pandas中的資料型別

在用python進行資料處理的時候，自帶的五種資料型別使用起來顯然是有侷限性的，python之強大在於各種包，在資料處理中用的最多的就是pandas和numpy。本文章主要介紹pandas的資料結構。 pandas有兩種資料結構

Python_pandas 兩種主要的資料型別（Series、DataFrame）

因為是數學專業，最開始深入學習的是MATLAB（下面簡稱M），感受最深的就是M的簡單高效，高精度，不管什麼東西，統統放到陣列（或者說矩陣）裡面，天哪嚕，不可思議，和C語言、C++相比，這玩意就是個天生的資料實驗室利器。後來用M做了個GUI，在編譯成C風格的exe時花了大心

Python資料分析之pandas基本資料結構：Series、DataFrame

1引言本文總結Pandas中兩種常用的資料型別：（1）Series是一種一維的帶標籤陣列物件。（2）DataFrame，二維，Series容器 2 Series陣列 2.1 Series陣列構成 Series陣列物件由兩部分構成：值（value）：一維陣列的各元素值，是一個ndarr

Python學習之路——Python基礎之基本資料型別(列表、元組和字典)

基本資料型別數字字串列表 list 元組 tuple 字典 dict 布林值 bool 列表和元組列表：有序，元素可以被修改元組：書寫格式：元組的一級元素不可被修改，不能被增加或者刪除，一般寫元租的時候，推薦在最後加入',' 索引：v =

Java學習筆記--資料型別、運算子、選擇語句、迴圈語句、方法、陣列介紹

java中的關鍵字全部為小寫。識別符號包括數字、字母、下劃線、數字和$，不能以數字開頭。包（也就是資料夾，用於區分相同名字的類）名為小寫字母。類或介面類名每個單詞首字母大寫。方法或變數名，首單詞首字母小寫，後面單詞首字母大寫。常量名全部大寫，單詞中間用下劃線隔開。字元常量用單引號，字串常量用雙引

pandas資料處理實踐三（DataFrame.apply資料預處理、DataFrame.drop_duplicates去重）

通過apply進行資料的預處理： DataFrame.apply（func，axis = 0，broadcast = None，raw = False，reduce = None，result_type = None，args =（），** kwds ） In [70

java學習(一)——資料型別、String類、char型別

資料型別 Java是一種強型別語言：共有8種基本型別 1.整型：用於表示沒有小數點的部分 1.1 int 4位元組 long 8位元組 short 2位元組 byte 1位元組 1.2 長整型數值：字尾L/l 二進位制：字首0b/0B 八

python教程2--資料型別、變數、常量、除法

1.資料型別 1.1整數 Python可以處理任意大小的整數，包括負整數。Java中的整數是有範圍限制的，比如int的範圍限制在-2147483648-2147483647之間。 1.2浮點數浮點數也就是小數，如果數字很大，可以把10用e替代，

Pandas 學習筆記 (一) ：Series資料結構

構建 Series可以使用以下建構函式建立 # data : 資料來源，ndarray、list、dic、常量等 # index : 索引，唯一和雜湊等，與資料的長度相同 # dtype : 指定資

JavaScript學習（二）：原始資料型別-字串、數字、布林值、null、undefined

資料型別 --- 能夠表示並操作值的型別，程式語言的最基本特性就是能夠支援多種資料型別。 JavaScript的資料型別分為兩種：原始型別（primitive type）物件型別（object type）原始資料型別包括：數字字串布林值，

6、學什麼技術之javascript學習指南資料型別轉換

console.log("------------一個簡單的例子--------------------"); var str = "123"; var res = Number( str );//不會修改str ,只會把轉換的結果返回出來,儲存在res中 console.l

開啟運維之路之第 6 篇——Redis五種資料型別string、list、hash、set、zset

1、上一篇介紹Redis的基礎，提供2種啟動方式，這裡彙總一下： ①寫 shell 指令碼，執行 shell 指令碼。詳見上一篇文章結尾。 ②[[email protected] ~]# cd /usr/local/redis/ [[email protected

2.資料型別和變數

1.整數 1 100 -100 299 使用type(100)打印出資料的型別 print(type(100)) <class 'int'> 2.浮點數 3.1415 -0.999 26.0 使用type(3.1415)打印出資料型別 prin

day 4 - 2 資料型別練習

1. 在字串中數字相連的為一組，輸出數字共有幾組如: 123sdf456sdf789 數字為：3組 info = input(">>>") for i in info: if i.isalpha(): info = info.r

JAVA學習|基本資料型別

JAVA學習|基本資料型別這裡從變數和常量兩方面來說明Java中的基本資料型別。變數變數是記憶體中的容器，用於儲存與讀取資料，同樣也是程式中的最基本儲存單元，其要素包括變數名、變數型別、作用域。變數必須先定義，後使用。 Java中定義了四類八種資料型別：

Javascript學習一資料型別

1、介紹js的基本資料型別　　 Undefined、Null、Boolean、Number、String （1） Undefined和Undifined區別 &

py_day01 變數、資料型別(int、str、bool)、if

一、python介紹解釋型語言，弱型別，高階語言，二、變數命名規則： 1、由字母、下劃線、數字組成（下劃線的功能=字母的） 2、不能以數字開頭，更不能是全數字 3、不能是python的關鍵字，這些符號和字母已被python佔用，不可以更改

python學習day4 資料型別 if語句

1.變數的記憶體管理　　cpython直譯器垃圾回收機制　　什麼是垃圾，當一個值身上沒有繫結變數名時，（該值的引用計數=0時）就是一個垃圾　　age=18 #18的引用計數=1 　　x=age 　#18的引用計數=2 　　age=19 #18的引用計數=1 　　del x #刪除x

	名字	年齡	成績	班級
3	d	76	79	8
1	b	23	53	8
3	d	76	79	8
1	b	23	53	8
2	c	45	63	8
3	d	76	79	8
0	a	32	51	8
2	c	45	63	8
3	d	76	79	8
2	c	45	63	8

	名字	年齡	成績	班級
3	d	76	79	8
1	b	23	53	8
3	d	76	79	8
1	b	23	53	8
2	c	45	63	8
3	d	76	79	8
0	a	32	51	8
2	c	45	63	8
3	d	76	79	8
2	c	45	63	8

	0	1	2	3	4	5
0	100	102	104	106	108	110
1	112	114	116	118	120	122
2	124	126	128	130	132	134
3	136	138	140	142	144	146

	0	1	2	3	4	5
0	1.0	1.0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0	1.0	1.0
3	1.0	1.0	1.0	1.0	1.0	1.0
4	1.0	1.0	1.0	1.0	1.0	1.0
5	1.0	1.0	1.0	1.0	1.0	1.0

	名字	年齡	成績	班級
3	d	76	79	8
1	b	23	53	8
3	d	76	79	8
1	b	23	53	8
2	c	45	63	8
3	d	76	79	8
0	a	32	51	8
2	c	45	63	8
3	d	76	79	8
2	c	45	63	8

	名字	年齡	成績	班級
3	d	76	79	8
1	b	23	53	8
3	d	76	79	8
1	b	23	53	8
2	c	45	63	8
3	d	76	79	8
0	a	32	51	8
2	c	45	63	8
3	d	76	79	8
2	c	45	63	8

	0	1	2	3	4	5
0	100	102	104	106	108	110
1	112	114	116	118	120	122
2	124	126	128	130	132	134
3	136	138	140	142	144	146

	0	1	2	3	4	5
0	1.0	1.0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0	1.0	1.0
3	1.0	1.0	1.0	1.0	1.0	1.0
4	1.0	1.0	1.0	1.0	1.0	1.0
5	1.0	1.0	1.0	1.0	1.0	1.0

Pandas學習2 --- 資料型別Series、DataFrame

Pandas的資料型別

Series --- 帶標籤的一維陣列

一、Series

1. Series初始化

匯入

通過可迭代物件建立Series

通過numpy陣列建立Series

通過標量建立

使用字典自帶索引

2. Series資料屬性

2.1 索引

獲得索引

可以手動建立Index物件（數量必須匹配）

2.2 值

返回資料

2.3 尺寸

2.4 其他

Series可以指定name

索引可以指定name屬性

head 和 tail ,預設(n=5)

3. Series運算

Series按照 index 計算，缺失則返回結果NaN（not a number）

使用函式方式運算，如果需要處理不匹配值，那麼對Series物件填充索引，指定填充值，並進行運算

幾個特殊浮點數, 以及空值的判斷

nd 和 pd 在計算時對空值的的處理不同

4. 提取元素

注意：

5. 標籤索引（loc）和位置索引（iloc） --- 避免索引混淆

6. 數值操作

9. 其他

二、DataFrame型別

1. DataFrame建立

IPython的擴充套件內建函式display() 可以把多個數據美化呈現方式

DataFrame使用的是列向量，因此通過字典建立，Key是列表簽名

index是行標籤， columns是列標籤

2. 抽取資料——抽樣

從頭和尾取資料

隨機取樣

3. DataFrame屬性

返回的資料為Series型別

4. 行、列操作：

4.1 獲取行、列

獲取一個數據

獲取一行

df [ ] 訪問多列

獲取一行，每一行是一個Series型別

訪問多行

4.2 增加行、列

4.2.1 獲取某一列，Series的索引name為DataFrame列標籤名字

4.2.2 新增一列

新增一個求和列

4.2.3 刪除列

4.2.4 獲取行

4.2.5 增加一行,需要新增的Series資料必須含有name(對應行標籤)

在增加多行的時候，優先使用concat,效能更好。

4.2.6 刪除l列、行

4.2.7 混合操作

切片訪問行

4.2.8 標籤名，name屬性的轉換

5. 計算

轉置

加法

加法對不齊，產生NaN

DaraFram 和 Series 加法

預設列對齊

也可以行操作

7. 排序

8. 統計方法

相關推薦

	名字	年齡	成績	班級
3	d	76	79	8
1	b	23	53	8
3	d	76	79	8
1	b	23	53	8
2	c	45	63	8
3	d	76	79	8
0	a	32	51	8
2	c	45	63	8
3	d	76	79	8
2	c	45	63	8

	名字	年齡	成績	班級
3	d	76	79	8
1	b	23	53	8
3	d	76	79	8
1	b	23	53	8
2	c	45	63	8
3	d	76	79	8
0	a	32	51	8
2	c	45	63	8
3	d	76	79	8
2	c	45	63	8

	0	1	2	3	4	5
0	100	102	104	106	108	110
1	112	114	116	118	120	122
2	124	126	128	130	132	134
3	136	138	140	142	144	146

	0	1	2	3	4	5
0	1.0	1.0	1.0	1.0	1.0	1.0
1	1.0	1.0	1.0	1.0	1.0	1.0
2	1.0	1.0	1.0	1.0	1.0	1.0
3	1.0	1.0	1.0	1.0	1.0	1.0
4	1.0	1.0	1.0	1.0	1.0	1.0
5	1.0	1.0	1.0	1.0	1.0	1.0