Python语法速查：数据处理

由 neevop 十二月 10, 2022

JSON

Text file format for storing collections of strings and numbers.

import json
<str>    = json.dumps(<object>)     # Converts object to JSON string.
<object> = json.loads(<str>)        # Converts JSON string to object.

Read Object from JSON File

def read_json_file(filename):
    with open(filename, encoding='utf-8') as file:
        return json.load(file)

Write Object to JSON File

def write_to_json_file(filename, an_object):
    with open(filename, 'w', encoding='utf-8') as file:
        json.dump(an_object, file, ensure_ascii=False, indent=2)

Pickle

Binary file format for storing Python objects.

import pickle
<bytes>  = pickle.dumps(<object>)   # Converts object to bytes object.
<object> = pickle.loads(<bytes>)    # Converts bytes object to object.

Read Object from File

def read_pickle_file(filename):
    with open(filename, 'rb') as file:
        return pickle.load(file)

Write Object to File

def write_to_pickle_file(filename, an_object):
    with open(filename, 'wb') as file:
        pickle.dump(an_object, file)

CSV

Text file format for storing spreadsheets.

import csv

Read

<reader> = csv.reader(<file>)       # Also: `dialect='excel', delimiter=','`.
<list>   = next(<reader>)           # Returns next row as a list of strings.
<list>   = list(<reader>)           # Returns a list of remaining rows.

File must be opened with a 'newline=""' argument, or newlines embedded inside quoted fields will not be interpreted correctly!
To print the spreadsheet to the console use Tabulate library.
For XML and binary Excel files (xlsx, xlsm and xlsb) use Pandas library.

Write

<writer> = csv.writer(<file>)       # Also: `dialect='excel', delimiter=','`.
<writer>.writerow(<collection>)     # Encodes objects using `str(<el>)`.
<writer>.writerows(<coll_of_coll>)  # Appends multiple rows.

File must be opened with a 'newline=""' argument, or ‘\r’ will be added in front of every ‘\n’ on platforms that use ‘\r\n’ line endings!

Parameters

'dialect' - Master parameter that sets the default values. String or a Dialect object.
'delimiter' - A one-character string used to separate fields.
'quotechar' - Character for quoting fields that contain special characters.
'doublequote' - Whether quotechars inside fields are/get doubled or escaped.
'skipinitialspace' - Is space character at the start of the field stripped by the reader.
'lineterminator' - How writer terminates rows. Reader is hardcoded to ‘\n’, ‘\r’, ‘\r\n’.
'quoting' - 0: As necessary, 1: All, 2: All but numbers which are read as floats, 3: None.
'escapechar' - Character for escaping quotechars if doublequote is False.

Dialects

+------------------+--------------+--------------+--------------+
|                  |     excel    |   excel-tab  |     unix     |
+------------------+--------------+--------------+--------------+
| delimiter        |       ','    |      '\t'    |       ','    |
| quotechar        |       '"'    |       '"'    |       '"'    |
| doublequote      |      True    |      True    |      True    |
| skipinitialspace |     False    |     False    |     False    |
| lineterminator   |    '\r\n'    |    '\r\n'    |      '\n'    |
| quoting          |         0    |         0    |         1    |
| escapechar       |      None    |      None    |      None    |
+------------------+--------------+--------------+--------------+

Read Rows from CSV File

def read_csv_file(filename, dialect='excel'):
    with open(filename, encoding='utf-8', newline='') as file:
        return list(csv.reader(file, dialect))

Write Rows to CSV File

def write_to_csv_file(filename, rows, dialect='excel'):
    with open(filename, 'w', encoding='utf-8', newline='') as file:
        writer = csv.writer(file, dialect)
        writer.writerows(rows)

SQLite

A server-less database engine that stores each database into a separate file.

import sqlite3
<conn> = sqlite3.connect(<path>)                # Opens existing or new file. Also ':memory:'.
<conn>.close()                                  # Closes the connection.

Read

<cursor> = <conn>.execute('<query>')            # Can raise a subclass of sqlite3.Error.
<tuple>  = <cursor>.fetchone()                  # Returns next row. Also next(<cursor>).
<list>   = <cursor>.fetchall()                  # Returns remaining rows. Also list(<cursor>).

Write

<conn>.execute('<query>')                       # Can raise a subclass of sqlite3.Error.
<conn>.commit()                                 # Saves all changes since the last commit.
<conn>.rollback()                               # Discards all changes since the last commit.

Or:

with <conn>:                                    # Exits the block with commit() or rollback(),
    <conn>.execute('<query>')                   # depending on whether any exception occurred.

Placeholders

<conn>.execute('<query>', <list/tuple>)         # Replaces '?'s in query with values.
<conn>.execute('<query>', <dict/namedtuple>)    # Replaces ':<key>'s with values.
<conn>.executemany('<query>', <coll_of_above>)  # Runs execute() multiple times.

Passed values can be of type str, int, float, bytes, None, bool, datetime.date or datetime.datetime.
Bools will be stored and returned as ints and dates as ISO formatted strings.

Example

Values are not actually saved in this example because 'conn.commit()' is omitted!

>>> conn = sqlite3.connect('test.db')
>>> conn.execute('CREATE TABLE person (person_id INTEGER PRIMARY KEY, name, height)')
>>> conn.execute('INSERT INTO person VALUES (NULL, ?, ?)', ('Jean-Luc', 187)).lastrowid
1
>>> conn.execute('SELECT * FROM person').fetchall()
[(1, 'Jean-Luc', 187)]

SqlAlchemy

# $ pip3 install sqlalchemy
from sqlalchemy import create_engine, text
<engine> = create_engine('<url>').connect()     # Url: 'dialect://user:password@host/dbname'.
<conn>   = <engine>.connect()                   # Creates a connection. Also <conn>.close().
<cursor> = <conn>.execute(text('<query>'), …)   # Replaces ':<key>'s with keyword arguments.
with <conn>.begin(): ...                        # Exits the block with commit or rollback.

+------------+--------------+-----------+-----------------------------------+
| Dialects   | pip3 install | import    | Dependencies                      |
+------------+--------------+-----------+-----------------------------------+
| mysql      | mysqlclient  | MySQLdb   | www.pypi.org/project/mysqlclient  |
| postgresql | psycopg2     | psycopg2  | www.psycopg.org/docs/install.html |
| mssql      | pyodbc       | pyodbc    | apt install g++ unixodbc-dev      |
| oracle     | cx_oracle    | cx_Oracle | Oracle Instant Client             |
+------------+--------------+-----------+-----------------------------------+

Bytes

Bytes object is an immutable sequence of single bytes. Mutable version is called bytearray.

<bytes> = b'<str>'                          # Only accepts ASCII characters and \x00-\xff.
<int>   = <bytes>[<index>]                  # Returns an int in range from 0 to 255.
<bytes> = <bytes>[<slice>]                  # Returns bytes even if it has only one element.
<bytes> = <bytes>.join(<coll_of_bytes>)     # Joins elements using bytes as a separator.

Encode

<bytes> = bytes(<coll_of_ints>)             # Ints must be in range from 0 to 255.
<bytes> = bytes(<str>, 'utf-8')             # Or: <str>.encode('utf-8')
<bytes> = <int>.to_bytes(n_bytes, …)        # `byteorder='little/big', signed=False`.
<bytes> = bytes.fromhex('<hex>')            # Hex pairs can be separated by whitespaces.

Decode

<list>  = list(<bytes>)                     # Returns ints in range from 0 to 255.
<str>   = str(<bytes>, 'utf-8')             # Or: <bytes>.decode('utf-8')
<int>   = int.from_bytes(<bytes>, …)        # `byteorder='little/big', signed=False`.
'<hex>' = <bytes>.hex()                     # Returns hex pairs. Accepts `sep=<str>`.

Read Bytes from File

def read_bytes(filename):
    with open(filename, 'rb') as file:
        return file.read()

Write Bytes to File

def write_bytes(filename, bytes_obj):
    with open(filename, 'wb') as file:
        file.write(bytes_obj)

Struct

Module that performs conversions between a sequence of numbers and a bytes object.
System’s type sizes, byte order, and alignment rules are used by default.

from struct import pack, unpack
<bytes> = pack('<format>', <el_1> [, ...])  # Packages arguments into bytes object.
<tuple> = unpack('<format>', <bytes>)       # Use iter_unpack() for iterator of tuples.

>>> pack('>hhl', 1, 2, 3)
b'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('>hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)

Format

For standard type sizes and manual alignment (padding) start format string with:

'=' - System’s byte order (usually little-endian).
'<' - Little-endian.
'>' - Big-endian (also '!').

Besides numbers, pack() and unpack() also support bytes objects as part of the sequence:

'c' - A bytes object with a single element. For pad byte use 'x'.
'<n>s' - A bytes object with n elements.

Integer types. Use a capital letter for unsigned type. Minimum and standard sizes are in brackets:

'b' - char (1/1)
'h' - short (2/2)
'i' - int (2/4)
'l' - long (4/4)
'q' - long long (8/8)

Floating point types:

'f' - float (4/4)
'd' - double (8/8)

Array

List that can only hold numbers of a predefined type. Available types and their minimum sizes in bytes are listed above. Sizes and byte order are always determined by the system.

from array import array
<array> = array('<typecode>', <collection>)    # Array from collection of numbers.
<array> = array('<typecode>', <bytes>)         # Array from bytes object.
<array> = array('<typecode>', <array>)         # Treats array as a sequence of numbers.
<bytes> = bytes(<array>)                       # Or: <array>.tobytes()
<file>.write(<array>)                          # Writes array to the binary file.

Memory View

A sequence object that points to the memory of another object.
Each element can reference a single or multiple consecutive bytes, depending on format.
Order and number of elements can be changed with slicing.
Casting only works between char and other types and uses system’s sizes.
Byte order is always determined by the system.

<mview> = memoryview(<bytes/bytearray/array>)  # Immutable if bytes, else mutable.
<real>  = <mview>[<index>]                     # Returns an int or a float.
<mview> = <mview>[<slice>]                     # Mview with rearranged elements.
<mview> = <mview>.cast('<typecode>')           # Casts memoryview to the new format.
<mview>.release()                              # Releases the object's memory buffer.

Decode

<bytes> = bytes(<mview>)                       # Returns a new bytes object.
<bytes> = <bytes>.join(<coll_of_mviews>)       # Joins mviews using bytes object as sep.
<array> = array('<typecode>', <mview>)         # Treats mview as a sequence of numbers.
<file>.write(<mview>)                          # Writes mview to the binary file.

<list>  = list(<mview>)                        # Returns a list of ints or floats.
<str>   = str(<mview>, 'utf-8')                # Treats mview as a bytes object.
<int>   = int.from_bytes(<mview>, …)           # `byteorder='little/big', signed=False`.
'<hex>' = <mview>.hex()                        # Treats mview as a bytes object.

Deque

A thread-safe list with efficient appends and pops from either side. Pronounced “deck”.

from collections import deque
<deque> = deque(<collection>, maxlen=None)

<deque>.appendleft(<el>)                       # Opposite element is dropped if full.
<deque>.extendleft(<collection>)               # Collection gets reversed.
<el> = <deque>.popleft()                       # Raises IndexError if empty.
<deque>.rotate(n=1)                            # Rotates elements to the right.

JSON

Read Object from JSON File

Write Object to JSON File

Pickle

Read Object from File

Write Object to File

CSV

Read

Write

Parameters

Dialects

Read Rows from CSV File

Write Rows to CSV File

SQLite

Read

Write

Or:

Placeholders

Example

SqlAlchemy

Bytes

Encode

Decode

Read Bytes from File

Write Bytes to File

Struct

Format

For standard type sizes and manual alignment (padding) start format string with:

Besides numbers, pack() and unpack() also support bytes objects as part of the sequence:

Integer types. Use a capital letter for unsigned type. Minimum and standard sizes are in brackets:

Floating point types:

Array

Memory View

Decode

Deque

搜索

分类

标签

编程

数学

花瓣

旅行

速查

Python语法速查：数据处理

JSON

Read Object from JSON File

Write Object to JSON File

Pickle

Read Object from File

Write Object to File

CSV

Read

Write

Parameters

Dialects

Read Rows from CSV File

Write Rows to CSV File

SQLite

Read

Write

Or:

Placeholders

Example

SqlAlchemy

Bytes

Encode

Decode

Read Bytes from File

Write Bytes to File

Struct

Format

For standard type sizes and manual alignment (padding) start format string with:

Besides numbers, pack() and unpack() also support bytes objects as part of the sequence:

Integer types. Use a capital letter for unsigned type. Minimum and standard sizes are in brackets:

Floating point types:

Array

Memory View

Decode

Deque

搜索

分类

标签