목록Note (462)
Note
How to create groud ids based on a given categorical variable? # Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4) np.random.seed(100) species_small = np.sort(np.random.choice(species, size=20)) species_small array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', '..
How to find the position of the nth largest value greater than a given value? # Input ser = pd.Series(np.random.randint(1, 100, 15)) # Solution print('ser: ', ser.tolist(), 'mean: ', round(ser.mean())) np.argwhere(ser > ser.mean())[1] # output 1 ser: [7, 77, 16, 86, 60, 38, 34, 36, 83, 27, 16, 52, 50, 52, 54] mean: 46 # output 2 array([3])
How to create row numbers grouped by a categorical variable? # Input: url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4) np.random.seed(100) species_small = np.sort(np.random.choice(species, size=20)) species_small array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Ir..
How to get the last n rows of a dataframe with row sum > 100? # Input df = pd.DataFrame(np.random.randint(10, 40, 60).reshape(-1, 4)) # Solution # print row sums rowsums = df.apply(np.sum, axis=1) # last two rows with row sum greater than 100 last_two_rows = df.iloc[np.where(rowsums > 100)[0][-2:], :]
How to generate one-hot encodings for an array in numpy? # Input: np.random.seed(101) arr = np.random.randint(1,4, size=6) arr # output array([2, 3, 2, 2, 2, 1]) # Solution: def one_hot_encodings(arr): uniqs = np.unique(arr) out = np.zeros((arr.shape[0], uniqs.shape[0])) for i, k in enumerate(arr): out[i, k-1] = 1 return out one_hot_encodings(arr) # output array([[ 0., 1., 0.], [ 0., 0., 1.], [ ..