Introduction
Standard Pandas
Useful Operations
Resampling
Useful for filling in timeseries data when there are gaps
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html
filled_df = df.resample('1T').asfreq().fillna(0)
Example
def resample_timeseries(self, df) -> pd.DataFrame:
"""
This Python function resamples a time series DataFrame to a common time interval based on the most common time difference between samples.
:param df: The `resample_timeseries` function takes a DataFrame `df` as input and resamples it based on the most common time interval found in the index of the DataFrame. The function calculates the most common time interval between samples, rounds it to the nearest second, and then uses this rounded interval to
:return: The function `resample_timeseries` is returning a resampled DataFrame with forward-filled
values based on the most common time interval found in the input DataFrame `df`.
"""
time_diffs = df.index.to_series().diff() # get the difference between sample times
rounded_interval = time_diffs.mode()[0]# find most common time interval
common_interval_seconds = round(rounded_interval.total_seconds()) # find the most common and round
rounded_sample_time = pd.Timedelta(seconds=common_interval_seconds) # make these a Timedelta object
df = df.resample(rounded_sample_time).ffill()
return df
Creating Columns Based on Existing Columns
Used to create data based on existing data
Example
Creating an ‘input’ column. When the state is not zero, make the input 1, else zero (step function).
# Using np.where
df["input"] = np.where(df["state"] != 0, 1, 0) #
#Alternatively, you can use `.apply` with a lambda function
df["input"] = df["state"].apply(lambda x: 1 if x != 0 else 0)
Turning Dataframes into Numpy Arrays
Simply use df.values
Example
numpy_dataset = df.values
Dropping NaN values
Standard dropna()
will drop all rows where, within any column, it finds a nan.
Can select a specific column to drop the values:
df = df.dropna(subset=['column_name'])