08 Demo: Python Time#

UW Geospatial Data Analysis
CEE467/CEWA567
David Shean

Introduction#

  • https://csit.kutztown.edu/~schwesin/fall20/csc223/lectures/Pandas_Time_Series.html

  • Multiple options to represent datetime objects - easy to convert

  • https://en.wikipedia.org/wiki/Second

Python datetime#

  • Built-in module called datetime which contains classes for datetime object (and timedelta object) - can be confusing

  • https://docs.python.org/3/library/datetime.html

NumPy datetime64#

  • https://numpy.org/doc/stable/reference/arrays.datetime.html

Pandas Timestamp#

  • https://pandas.pydata.org/docs/user_guide/timeseries.html

  • https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html

  • https://pandas.pydata.org/docs/user_guide/timeseries.html#overview

  • DatetimeIndex

  • pd.to_datetime()

    • Accepts “int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like”

xarray#

  • https://xarray.pydata.org/en/stable/user-guide/time-series.html

Day of calendar year#

  • January 1 = 1

  • January 2 = 2

  • December 31 = 365

Water year#

  • Starts October 1, ends September

  • Southern hemisphere?

Time zones#

  • Let Pandas handle this

  • You will inevitably get a warning about timezone aware vs. naive Timestamp objects

    • Add time zone: https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.tz_localize.html

    • Remove time zone: https://stackoverflow.com/a/34687479

  • General advice (time and timestamps are messy): https://www.youtube.com/watch?v=-5wpm-gesOY&ab_channel=Computerphile

Discussion#

  • (t,x,y,z) records for one or more variables

  • Pandas Timestamp vs. Python DateTime vs. Numpy.DateTime64

    • Some functions across different modules play nicely with one and not the other

  • Dealing with missing values in DataFrame

    • Sometimes sensors fail or datalogger fails, sometimes values are flagged as erroneous

    • Pandas has excellent support for missing values: https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html

    • dropna() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna

  • Trajectories

    • Argo floats (https://argo.ucsd.edu/)

    • Weather balloons

    • GNSS tracks - vehicles, pedestrians, aircraft

      • Spatial and temporal derivatives

  • Permanent stations

    • Stream gage

    • SNOTEL sites

  • How big is too big for Pandas/GeoPandas?

    • https://github.com/toddwschneider/nyc-taxi-data

    • PostgreSQL/PostGIS

      • SQL - Structured Query Language, used for managing data in a relational database

  • What to do with multiple variables for each timestamp?

    • xarray works well for multiple variables (e.g., snow depth and SWE for same site) for each station for each time

      • https://docs.xarray.dev/en/stable/

    • Separate 2D dataframes

      • One storing locations of all sites

      • One storing time series of some variable for all sites

      • Common station ID as key

from datetime import datetime
import pandas as pd
import numpy as np
datetime?
Init signature: datetime(self, /, *args, **kwargs)
Docstring:     
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints.
File:           /srv/conda/envs/notebook/lib/python3.10/datetime.py
Type:           type
Subclasses:     ABCTimestamp, _NaT
dt1 = datetime(2023, 2, 22)
dt2 = datetime.now()
print(dt2)
2023-02-25 21:28:50.308138
dt2
datetime.datetime(2023, 2, 25, 21, 28, 50, 308138)
dt2.year
2023
dt2.strftime?
Docstring: format -> strftime() style string.
Type:      builtin_function_or_method

Side note: formatting timestamp strings#

#Typical U.S. date format
dt2.strftime('%m/%d/%y')
'02/25/23'
#This won't sort alphanumerically
dt2.strftime('%m%d%Y')
'02252023'
#YYYYMMDD is better and will sort alphanumerically
dt2.strftime('%Y%m%d')
'20230225'
dt1
datetime.datetime(2023, 2, 22, 0, 0)
dt2
datetime.datetime(2023, 2, 25, 21, 28, 50, 308138)
dt_diff = dt2 - dt1
dt_diff
datetime.timedelta(days=3, seconds=77330, microseconds=308138)
dt_diff.total_seconds()
336530.308138

How many seconds in a day? In a year?#

  • approximately pi * 10^7

  • What is a second anyway?

dt_diff.total_seconds()/(60*60*24*365.25)
0.010664001956359167
60*60*24*365.25
31557600.0
dt2
datetime.datetime(2023, 2, 25, 21, 28, 50, 308138)
pd.to_datetime(dt2)
Timestamp('2023-02-25 21:28:50.308138')
ts1 = pd.Timestamp('2019-02-01 12:00:00')
ts2 = pd.Timestamp('2019-02-06 00:00:00')
ts1
Timestamp('2019-02-01 12:00:00')
ts2
Timestamp('2019-02-06 00:00:00')
dt = ts2 - ts1
dt
Timedelta('4 days 12:00:00')
ts1
Timestamp('2019-02-01 12:00:00')
ts1 + dt
Timestamp('2019-02-06 00:00:00')
ts2 + dt
Timestamp('2019-02-10 12:00:00')
ts1 - pd.Timedelta(days=1)
Timestamp('2019-01-31 12:00:00')
dt.total_seconds()
388800.0