목록전체 글 (462)
Note
How to find the position of missing values in numpy array? # Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Solution print("Number of missing values: \n", np.isnan(iris_2d[:, 0]).sum()) # output 5 ..
How to insert values at random positions in an array? # Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') # 1 i, j = np.where(iris_2d) # i, j contain the row numbers and column numbers of 600 elements of iris_x np.random.seed(100) iris_2d[np.random.choice((i), 20), np.random.choice((j), 20)] = np.nan..
How to compute the autocorrelations of a numeric series? # Input ser = pd.Series(np.arange(20) + np.random.normal(1, 10, 20)) # Solution autocorrelations = [ser.autocorr(i).round(2) for i in range(11)] print(autocorrelations[1:]) print('Lag having highest correlation: ', np.argmax(np.abs(autocorrelations[1:]))+1) # output [0.29999999999999999, -0.11, -0.17000000000000001, 0.46000000000000002, 0...
How to find the percentile scores of a numpy array? # Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Solution np.percentile(sepallength, q=[5, 95]) # output array([ 4.6 , 7.255])
How to fill an intermittent time series so all missing dates show up with values of previous non-missing date? # Input ser = pd.Series([1,10,3, np.nan], index=pd.to_datetime(['2000-01-01', '2000-01-03', '2000-01-06', '2000-01-08'])) # 1 ser.resample('D').ffill() # fill with previous value # 2 ser.resample('D').bfill() # fill with next value ser.resample('D').bfill().ffill() # fill next else prev..