当前位置：同乐学堂 > python > 正文

原创-Pandas心法之初级数据操作与索引对齐-2

2021-11-17 分类：python 阅读(3188)

1、数据操作

"""
# 4、对熊猫中的数据进行操作
由于Pandas设计为可与NumPy配合使用，
因此任何NumPy ufunc均可在PandasSeries和DataFrame对象上使用
"""

import pandas as pd
import numpy as np

rng = np.random.RandomState(42)
ser = pd.Series(rng.randint(0, 10, 4))
print(ser)

df = pd.DataFrame(rng.randint(0, 10, (3, 4)),
                  columns=['A', 'B', 'C', 'D'])
print(df)
print(np.max(ser))  # 求Series 中最大的值

# 复杂一点的计算
print(np.sin(df * np.pi / 4))

2、索引对齐

'''
# 5、索引对齐
对于两个Series或两个DataFrame对象的二进制操作，Pandas将在执行操作的过程中对齐索引。在处理不完整的数据时，这非常方便
'''

# 两个Series 中有不同的index 索引
area = pd.Series({'Alaska': 1723337, 'Texas': 695662,
                  'California': 423967}, name='area')
population = pd.Series({'California': 38332521, 'Texas': 26448193,
                        'New York': 19651127}, name='population')

print(population / area)

# 找出两个Series 中所有行索引
print(area.index | population.index)
print(area.index & population.index)

# 处理索引不对齐的情况、
A = pd.Series([2, 4, 6], index=[0, 1, 2])
B = pd.Series([1, 3, 5], index=[1, 2, 3])
print(A + B)

# 使用NaN不是理想的行为.
# 处理方式,在索引不对齐的情况，把不存在的索引值填充为零。
print(A.add(B, fill_value=0))

df1 = pd.DataFrame(rng.randint(0, 20, (2, 2)),
                   columns=list('AB'))

print(df1)
print('------------------')

df2 = pd.DataFrame(rng.randint(0, 10, (3, 3)),
                   columns=list('BAC'))

print(df2)
print('------------------')
print(df1+df2)

'''
请注意，无论两个对象中的索引顺序如何，
索引都正确对齐，并且对结果中的索引进行排序
'''

# 更好的处理NaN处理方式
print(df1.stack()) # 堆叠,把列索引变行索引，相当于行的二级索引。
fill = df1.stack().mean()
print(fill)
df1.add(df2, fill_value=fill) # 把NaN 填充为均值。

'''
基础的运算符可以直接应用与DataFrame

+ add()
- sub()， subtract()
* mul()， multiply()
/ truediv()，div()，divide()
// floordiv()
% mod()
** pow()

'''

# numpy 二维数组处理 减法
A = rng.randint(10, size=(3, 4))
# print(A)
# print(A - A[0])

# DataFrame数据 处理方式
df = pd.DataFrame(A, columns=list('QRST'), index=['a', 'b', 'c'])
print(df)

print('--------------------')

# 取出列索引为Q 行索引名为a 的数据：3
print(df['Q']['a'])

# 。你的目的虽然是取出行索引全部的数据。但是语法方式行不通。
print(df['a'])  # 错误

# 通过堆叠，你可以取出行索引为a 的全部数据。
print(df.stack()['a'])  # 正确

print(df - df.iloc[0])  # 默认沿着行 axis =1 计算。

print(df['Q']) # 列索引
print(df.iloc[0]) # 行索引

# 如果要改为按列操作，
# 则可以在指定axis关键字的同时使用前面提到的对象方法：
print(df.subtract(df['R'], axis=0))

# 行数据索引为0,列索引步长为2.
halfrow = df.iloc[0, ::2]
print(halfrow)
print(df - halfrow)

打赏

赞(0) 打赏

未经允许不得转载：同乐学堂 » 原创-Pandas心法之初级数据操作与索引对齐-2

标签：python

相关推荐

特别的技术，给特别的你！

联系QQ:1071235258 QQ群:710045715

error: Sorry,暂时内容不可复制!