我已经建立了一个马尔可夫链,我可以用它来模拟人们的日常生活(活动模式)。每个模拟日被分成144个时间步,个人可以执行14个活动中的一个。它们是:外出工作(1)外出休闲(2)外出购物(3)睡觉(4)做饭(5)用洗碗机洗衣服(7)吸尘(8)用电脑看电视(10)个人卫生(11)熨烫(12)听音乐(13)其他(14)
我从一个数据集中计算了马尔可夫链的转移概率,该数据集包含总共226个人的日记,这些人在10分钟的间隔内记录了他们的活动。数据集可在以下位置找到:https://www.dropbox.com/s/tyjvh810eg5brkr/data?dl=0
我现在已经用马尔可夫链模拟了4000天,并想用原始数据集来验证结果。
为此,我分析了各个活动的结果,并计算了每个时间步的期望值和95%的置信区间。然后,我检查原始数据集中活动的平均值是否在此区间内,并计算不在置信区间的上限或下限阈值内的所有点的数量。
然而,对于每个模拟,我得到了不同高度的偏差,有时在1%的范围内,有时超过10%。
我现在想知道这是如何可能的,它是否完全可能。
我模拟了4x4000天,结果是:


代码的第一部分在这里(对不起,太长了):
# import of libraries
# First, create a 143xarray for transition probability matrix since the first state of the activity pattern will be determined by the starting probability which will be calculated later.
data= data(Dataframe vom dataset see link above)
# in case activity is not observed in time step t, return 0 instead divide by 0
def a1(x, y):
try:
return x / y
except ZeroDivisionError:
return 0
#create matrix for transition probabilities
matrix = np.empty(shape=(143, 14, 14))
#iterate through each time step and calculate probabilities for each activity (from state j to state m)
#save matrix as 3D-array
h=data
for i in range(0, 143):
for j in range(1, 15):
for m in range(1, 15):
a = a1(len(h[(h.iloc[:, i] == j) & (h.iloc[:, i + 1] == m)]), len(h[h.iloc[:, i] == j]))
matrix[i, j - 1, m - 1] = a
np.save(einp_wi_week, matrix)
# now calculate starting probabilities for time step 0 (initial start activity)
matrix_original_probability_starting_state= np.empty(shape=(14, 1))
for i in range(0, 1):
for j in range(1, 15):
a = a1(len(h[(h.iloc[:, i] == j)]), len(h.iloc[:, i]))
matrix_original_probability_starting_state[j - 1, i] = a
np.save(einp_wi_week_start, matrix_original_probability_starting_state)
# import transition probabilities for markov-chain (I have several worksheets to keep track of my approach and to document the results)
# MARKOV-CHAINS TO GENERATE ACTIVITY PATTERNS
# for every time step, the markov-chain can be either in one of these states / activities
states = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"]
# Possible sequences of state changes (since transferring from state 1 to state 11, for instance has the same code as transferring from 11 to 1, all state transfers for 1 to a number higher than 9 is adjusted in the transfer code)
transitionName = [["11", "12", "13", "14", "15", "16", "17", "18", "19", "0110", "0111", "0112", "0113", "0114"], ["21", "22", "23", "24", "25", "26", "27", "28", "29", "210", "211", "212", "213", "214"], ["31", "32", "33", "34", "35", "36", "37", "38", "39", "310", "311", "312", "313", "314"], ["41", "42", "43", "44", "45", "46", "47", "48", "49", "410", "411", "412", "413", "414"], ["51", "52", "53", "54", "55", "56", "57", "58", "59", "510", "511", "512", "513", "514"], ["61", "62", "63", "64", "65", "66", "67", "68", "69", "610", "611", "612", "613", "614"], ["71", "72", "73", "74", "75", "76", "77", "78", "79", "710", "711", "712", "713", "714"], ["81", "82", "83", "84", "85", "86", "87", "88", "89", "810", "811", "812", "813", "814"], ["91", "92", "93", "94", "95", "96", "97", "98", "99", "910", "911", "912", "913", "914"], ["101", "102", "103", "104", "105", "106", "107", "108", "109", "1010", "1011", "1012", "1013", "1014"], ["111", "112", "113", "114", "115", "116", "117", "118", "119", "1110", "1111", "1112", "1113", "1114"], ["121", "122", "123", "124", "125", "126", "127", "128", "129", "1210", "1211", "1212", "1213", "1214"], ["131", "132", "133", "134", "135", "136", "137", "138", "139", "1310", "1311", "1312", "1313", "1314"], ["141", "142", "143", "144", "145", "146", "147", "148", "149", "1410", "1411", "1412", "1413", "1414"]]
def activity_forecast(simulation_days):
# activitypattern is an empty array that stores the sequence of states in the simulation.
activitypattern = []
for m in range(0,simulation_days):
# for each simulation day, the activitypattern for all 144 time steps (10-min resolution) must be generated
# pro determines the transition probabilities matrix that is used for the simulation day and can be changed with regards to the day simulated (weekday, Friday, Saturday, Sunday).
# Need to be changed for the validation in order to validate activity patterns for weekdays, Fridays, Saturdays, and Sundays (see transition probability matrices above and comments later)
pro = einp_wi_week
#Determine starting activity. Therefore use vector with starting probabilities for all activites, generated in "Clustern_Ausgangsdaten..."
activityToday = np.random.choice(states,replace=True,p=einp_wi_week_start[:,0])
# create array were the sequence of states for simulation day is stored. This array will be appended to the activity pattern array
activityList = [activityToday]
#since the first activity state for each simulation day is determined, we have to generate activity states for the 143 following time steps
for i in range(0,143):
#create sequence of activities
if activityToday == "1":
change = np.random.choice(transitionName[0],replace=True,p=pro[i,0,:])
if change == "11":
activityList.append("1")
pass
elif change == "12":
activityToday = "2"
activityList.append("2")
elif change == "13":
activityToday = "3"
activityList.append("3")
elif change == "14":
activityToday = "4"
activityList.append("4")
elif change == "15":
activityToday = "5"
activityList.append("5")
elif change == "16":
activityToday = "6"
activityList.append("6")
elif change == "17":
activityToday = "7"
activityList.append("7")
elif change == "18":
activityToday = "8"
activityList.append("8")
elif change == "19":
activityToday = "9"
activityList.append("9")
elif change == "0110":
activityToday = "10"
activityList.append("10")
elif change == "0111":
activityToday = "11"
activityList.append("11")
elif change == "0112":
activityToday = "12"
activityList.append("12")
elif change == "0113":
activityToday = "13"
activityList.append("13")
else:
activityToday = "14"
activityList.append("14")
elif activityToday == "2":
change = np.random.choice(transitionName[1],replace=True,p=pro[i,1,:])
if change == "22":
activityList.append("2")
pass
elif change == "21":
activityToday = "1"
activityList.append("1")
elif change == "23":
activityToday = "3"
activityList.append("3")
elif change == "24":
activityToday = "4"
activityList.append("4")
elif change == "25":
activityToday = "5"
activityList.append("5")
elif change == "26":
activityToday = "6"
activityList.append("6")
elif change == "27":
activityToday = "7"
activityList.append("7")
elif change == "28":
activityToday = "8"
activityList.append("8")
elif change == "29":
activityToday = "9"
activityList.append("9")
elif change == "210":
activityToday = "10"
activityList.append("10")
elif change == "211":
activityToday = "11"
activityList.append("11")
elif change == "212":
activityToday = "12"
activityList.append("12")
elif change == "213":
activityToday = "13"
activityList.append("13")
else:
activityToday = "14"
activityList.append("14")
[code is too long - code needs to be written for each activity (1-14)]
activitypattern.append(activityList)
# save the files in order to use them later for comparison(and also for documentation since they can change for each simulation)
df=pd.DataFrame(activitypattern)
df.to_csv("Activitypatterns_synthetic_one_person_ft_aggregated_week_4000_days_new",index=False)
#call function
activity_forecast(4000)我很高兴得到任何建议/反馈。
谢谢,费利克斯
发布于 2019-03-31 01:26:48
下面是更多代码:)
# Activity patterns have been saved
# Now, use original data and calculate the probability for each time step
k=data
matrix_original_one_person_wi_week= np.empty(shape=(14, 144))
for i in range (0,144):
for j in range (1,15):
a=a1(len(k[(k.iloc[:,i]==j)]),len(k.iloc[:,i]))
matrix_original_one_person_wi_week[j-1,i]=a
#create dataframe
matrix_df_activities_original_one_person_winter_week=pd.DataFrame(matrix_original_one_person_wi_week)
print(matrix_df_activities_original_one_person_winter_week)
# Activity profiles have been generated for [4000] simulation days and stored as csv file.
# loading data
activities_one_person_winter_week_4000_days=pd.read_csv("XXX", delimiter=";")
# load timesteps csv (10-min-resolution array) to create array with 10-min time steps as x-axis
timesteps = pd.read_csv("XXX", delimiter=";", header=None)
timesteps_array = timesteps[0].values.tolist()
# now transform to dataframe
df_activities_one_person_winter_week_4000_days=pd.DataFrame(activities_one_person_winter_week_4000_days)
# upper and lower threshold for the 95% confidence interval are stored within a single array.
confidence_upper = []
confidence_lower = []
# now iterate through each time step to get the mean value and the upper and lower threshold
# this is done for each activity
# for each column of the dataframe, change values of activities to 1 (activity observed) or 0 (other activity observed)
# for changing the activity for which the values should be calculated, just change the the first value within np.where (a==)
for i in range(0,144):
a = df_activities_one_person_winter_week_4000_days.iloc[:,i]
#look for specifc value (first number) and replace it with (1) the rest ist (0), activity starts with 1, not zero!
count = np.where(a == 1, 1, 0)
mean = count.mean()
#confidence interval is 0.95
confidence = 0.95
# number of observations
n = len(count)
#standard error
std_err = stats.sem(count)
# calculate value for lower and upper threshold
threshold_value = std_err * stats.t.ppf((1 + confidence) / 2, n - 1)
# add / subtract threshold value from mean value mean
upper_threshold = mean + threshold_value
lower_threshold = mean - threshold_value
#append values to array
confidence_upper.append(upper_threshold*100)
confidence_lower.append(lower_threshold*100)
#now we need to calculate the mean value for each activity for each time step from the original data
# just change the first value in iloc for the activity you want to have the probabilities for (its always "a" (activity number defined in np.where) - 1)
original_data_value=np.array(matrix_df_activities_original_one_person_winter_week.iloc[0,:])
# Now count the frequency how many data points of the original distributiona are outside of the confidence interval
# first multiply value by 100 in order to compare (since other arrays are in XX% format)
#original_data_value_percentage=[]
#for i in range(0,144):
#a=original_data_value[i]
#original_data_value_percentage.append(a)
#original_data_value_percentage
# now create array where all values that are lower/upper of confidence interval are stored
count_frequency=[]
#iterate through each time step
for i in range(0,144):
if (original_data_value[i]*100)<confidence_lower[i]:
count_frequency.append(original_data_value[i]*100)
print("lower",i, (original_data_value[i]*100),confidence_lower[i])
elif (original_data_value[i]*100)>confidence_upper[i]:
count_frequency.append(original_data_value[i]*100)
print("upper", i, (original_data_value[i]*100),confidence_upper[i])
else:
pass
print((len(count_frequency)/144)*100)https://stackoverflow.com/questions/55432038
复制相似问题