10_python_libraries.ipynb
- Lecture 10: Python Libraries
- Outline
- Libraries and Modules
- Libraries: random
- Generating random numbers
- Libraries: NumPy
- Libraries: Pandas
- Creating a DataFrame:
- Accessing Elements of a DataFrame
- Modifying Elements of a DataFrame
- Convenient `DateFrames` Functionalities to Manipulate Data
- Libraries: matplotlib
- The Basic Syntax
- Plot Attributes
- Saving the Plots
- Animations and 3D-plots
- Summary: Looking Back
- File 文件
- Edit 编辑
- View 看法
- Run
- Kernel 核心
- Settings 设置
- Help 帮助
Introduction to Programming for Business Analytics
业务分析编程简介 ¶
业务分析编程简介
Murwan Siddig, Stefan Pilot
穆尔万·西迪格、斯特凡·派洛特
Lecture 10: Python Libraries¶
第 10 讲:Python 库 ¶
Outline¶ 大纲 ¶
- Libraries and Modules: 库和模块:
- Sources for Finding Packages for your Use-case.
为您的用例查找包的来源。 random
.numpy
(Arrays and Matrix Operations).
numpy
(数组和矩阵运算)。pandas
(DataFrames).pandas
(数据帧)。matplotlib
(Plotting and Data Visualization).
matplotlib
(绘图和数据可视化)。
- Sources for Finding Packages for your Use-case.
Libraries and Modules¶ 库和模块 ¶
In the following section, we discuss some essential Python built-in modules (e.g., random
) and external libraries (e.g., numpy
, pandas
, and matplotlib
).
在下一节中,我们将讨论一些基本的 Python 内置模块(例如 random
)和外部库(例如 numpy
、 pandas
和 matplotlib
)。
- We typically shorten the imported name to
rd
for better readability of the code usingrandom
.
我们通常将导入的名称缩短为rd
,以便使用random
提高代码的可读性。
Generating random numbers¶
生成随机数 ¶
- The most basic form of a random element can be generated using the function
random
as follows.
随机元素的最基本形式可以使用函数random
生成,如下所示。
- To generate a random
int
number between0
andn
, we use the functionrandrange
as follows.
要生成0
和n
之间的随机int
数字,我们使用函数randrange
,如下所示。
- In general, the function
randrange
returns a randomly selected element fromrange(start, stop, step)
.
一般来说,函数randrange
返回从range(start, stop, step)
中随机选择的元素。 - For instance, the following example shows how we can use the
randrange
function to return an even integer between0
and100
.
例如,下面的示例显示了如何使用randrange
函数返回0
和100
之间的偶数。
- To generate a random
float
number within a specific range, we use the functionuniform
as follows.
为了生成特定范围内的随机float
数字,我们使用函数uniform
,如下所示。
- We can generate a random list of elements without replacement using the function
sample
as follows.
我们可以使用函数sample
生成一个随机元素列表,无需替换,如下所示。
- We can generate a random list of elements with replacement using the function
choices
as follows.
我们可以使用函数choices
生成一个带有替换的随机元素列表,如下所示。
- Note that the functions
sample
andchoices
do not require the values to be numerical.
请注意,函数sample
和choices
不要求值是数字。
- To generate a random permutation of the elements in a list, we use the function
shuffle
as follows.
为了生成列表中元素的随机排列,我们使用函数shuffle
,如下所示。
Consider the following two matrices,
考虑以下两个矩阵
Although we can interpret Python's lists as one-dimensional arrays, unlike Julia, Python has no built-in support for (multi-dimensional) arrays and matrix operations.
虽然我们可以将 Python 的列表解释为一维数组,但与 Julia 不同的是,Python 没有对(多维)数组和矩阵运算的内置支持。
Note that, in Python, applying the binary operators +
and *
on lists is possible in some cases. However, they do not perform linear algebraic operations.
请注意,在 Python 中,在某些情况下可以在列表上应用二元运算符 +
和 *
。然而,它们不执行线性代数运算。
- Applying the
+
operator is possible when both operands are lists.
当两个操作数都是列表时,可以应用+
运算符。 - The result is a concatenated list as shown in the following examples.
结果是一个串联列表,如以下示例所示。
- Applying the
*
operator is possible when one of the operands is a list and the other is a non-negative integer.
当操作数之一是列表而另一个是非负整数时,可以应用*
运算符。 - The results is a new list that repeats the elements for a given number of times as shown in the following examples.
结果是一个新列表,它重复元素给定的次数,如以下示例所示。
The NumPy (numpy
) library provides efficient ways of creating arrays and manipulating numerical data inside them, including matrix operations.
NumPy ( numpy
) 库提供了创建数组和操作数组内部数值数据的有效方法,包括矩阵运算。
- Let's first install NumPy using the following command.
我们首先使用以下命令安装 NumPy。
- To access the
numpy
library and its functions, we import it as follows.
要访问numpy
库及其函数,我们按如下方式导入它。
- We typically shorten the imported name to
np
for better readability of the code using NumPy.
我们通常将导入的名称缩短为np
,以便使用 NumPy 提高代码的可读性。 - This is a widely adopted convention that you should follow so that anyone working with your code can easily understand it.
这是一个广泛采用的约定,您应该遵循它,以便使用您的代码的任何人都可以轻松理解它。
Addition: To calculate +
operator as follows.
加法:要计算 +
运算符,如下所示。
Multiplication: To calculate matmul
method as follows.
乘法:为了计算 matmul
方法,如下所示。
Element-wise operations: Note that, similarly to the +
operator, using binary operators such as *
, /
, and **
perform element-wise operations.
按元素操作:请注意,与 +
运算符类似,使用 *
、 /
和 **
等二元运算符执行逐元素操作。
- Morevoer, NumPy gives the ability to carry out operations between an array and a single number.
更重要的是,NumPy 提供了在数组和单个数字之间执行操作的能力。
- Note that, we also shorten
pandas
aspd
for better readability.
请注意,为了更好的可读性,我们还将pandas
缩短为pd
。
Creating a DataFrame:¶ 创建数据框:¶
- From other Python data structures.
来自其他 Python 数据结构。
- We can create a
DataFrame
from other Python data structures all at once as follows.
我们可以一次性从其他 Python 数据结构创建DataFrame
,如下所示。
- We can start with an empty
DataFrame
and add elements to it column-by-column or row-by-row.
我们可以从一个空的DataFrame
开始,然后逐列或逐行向其中添加元素。
- We can create a
DataFrame
by converting other data types such asdict
andnumpy.array
.
我们可以通过转换其他数据类型(例如dict
和numpy.array
)来创建DataFrame
。
- From permanent storage files such as CSV files.
来自永久存储文件,例如 CSV 文件。
- For this, we are going to need a library for working with CSV files.
为此,我们需要一个用于处理 CSV 文件的库。 - Python has a built-in module for working with CSV files, called
csv
. Therefore, we do not need to install it. However, we need to import it in any code that uses it as follows.
Python 有一个用于处理 CSV 文件的内置模块,称为csv
。因此,我们不需要安装它。但是,我们需要将其导入到任何使用它的代码中,如下所示。
- Once the
csv
module is loaded, we use theread_csv
function to read thegerman.csv
file as follows.
加载csv
模块后,我们使用read_csv
函数读取german.csv
文件,如下所示。
- By contrast, we can save a
DataFrame
as a.csv
file using the functionto_csv
.
相比之下,我们可以使用函数to_csv
将DataFrame
保存为.csv
文件。- To demonstrate, we create another copy of
"german.csv"
, now assigned to the variabledf
, under a different name called"output.csv"
.
- To demonstrate, we create another copy of
Accessing Elements of a DataFrame¶
Similarly to Julia, a DataFrame
can be accessed via its columns, rows and cells.
- Accessing via columns.
- Accessing via rows.
- Accessing via cells location.
Modifying Elements of a DataFrame¶
- We can mutate the elements of an entire column as follows.
- We can rename one or multiple columns as follows.
- We can insert a new column at a particular position as follows.
- We can remove rows that contain missing values as follows.
- We can remove rows based on a conditional expression as follows.
- We can sort the rows of a DataFrame as follows.
Libraries: matplotlib¶
Matplotlib
is a comprehensive library for creating static, animated, and interactive visualizations in Python.
- Let's first install and import Matplotlib.
- For the purpose of this lecture, we are going to use a submodule of Matplotlib called
pyplot
as follows.
- Note that, analogously to shortening NumPy as
np
and Pandas aspd
, we shortenpyplot
asplt
for better readability.
The Basic Syntax¶
- Scatter plots:
- Bar chart:
- Line grahps:
- Multiple lines:
- We can change the title font size as follows.
- By default the width and hight of the plot are 6.4 and 4.8, respectively. We can change the plot size plot as follows.
2. Series attributes:
- We can add a label to the plot as follows.
- We can specify each series color as follows.
- We can change the line width as follows.
- We can change the line style as follows.
- We can annotate the plot as follows.
3. Subplot attributes:
- By default, Python choses the
"best"
position to place the legend. But if we want, we can specify the legend location as follows.
- We can create a single plot that combines multiple subplots as follows.
4. Axis attributes:
- We can label the axes of the plot as follows.
- We can determine the level of granuality in each axis by specifying the x-ticks and y-ticks as follows.
Saving the Plots¶
- We can save our plots as follows.
Animations and 3D-plots¶
Exercise.
Summary: Looking Back¶
Looking Back:
random
is a built-in Python module for generating (pseudo) random numbers for various distributions.numpy
is a library that provides efficient ways of creating arrays and manipulating numerical data inside them, including matrix operations.csv
is a built-in Python module for working with CSV files.Pandas
provides easy-to-use data structures and data analysis tools in Python, including DataFrames.Matplotlib
is a comprehensive library for creating static, animated, and interactive visualizations in Python.