목록Note (468)
Note
How to import only every nth row from a csv file to create a dataframe? # 1: Use chunks and for-loop df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', chunksize=50) df2 = pd.DataFrame() for chunk in df: df2 = df2.append(chunk.iloc[0,:]) # 2: Use chunks and list comprehension df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/..
How to find the position of missing values in numpy array? # Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3]) iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan # Solution print("Number of missing values: \n", np.isnan(iris_2d[:, 0]).sum()) # output 5 ..
How to insert values at random positions in an array? # Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris_2d = np.genfromtxt(url, delimiter=',', dtype='object') # 1 i, j = np.where(iris_2d) # i, j contain the row numbers and column numbers of 600 elements of iris_x np.random.seed(100) iris_2d[np.random.choice((i), 20), np.random.choice((j), 20)] = np.nan..
How to compute the autocorrelations of a numeric series? # Input ser = pd.Series(np.arange(20) + np.random.normal(1, 10, 20)) # Solution autocorrelations = [ser.autocorr(i).round(2) for i in range(11)] print(autocorrelations[1:]) print('Lag having highest correlation: ', np.argmax(np.abs(autocorrelations[1:]))+1) # output [0.29999999999999999, -0.11, -0.17000000000000001, 0.46000000000000002, 0...
How to find the percentile scores of a numpy array? # Input url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0]) # Solution np.percentile(sepallength, q=[5, 95]) # output array([ 4.6 , 7.255])