목록Pandas (57)
Note
How to get the last n rows of a dataframe with row sum > 100? # Input df = pd.DataFrame(np.random.randint(10, 40, 60).reshape(-1, 4)) # Solution # print row sums rowsums = df.apply(np.sum, axis=1) # last two rows with row sum greater than 100 last_two_rows = df.iloc[np.where(rowsums > 100)[0][-2:], :]
How to get the row number of the nth largest value in a column? # Input df = pd.DataFrame(np.random.randint(1, 30, 30).reshape(10,-1), columns=list('abc')) # Solution n = 5 df['a'].argsort()[::-1][n] # output a b c 0 27 7 25 1 8 4 20 2 1 7 17 3 24 9 17 4 21 15 9 5 21 16 20 6 19 27 25 7 12 8 20 8 11 16 28 9 24 13 4 4
How to create a primary key index by combining relevant columns? # Input df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv', usecols=[0,1,2,3,5]) # Solution df[['Manufacturer', 'Model', 'Type']] = df[['Manufacturer', 'Model', 'Type']].fillna('missing') df.index = df.Manufacturer + '_' + df.Model + '_' + df.Type print(df.index.is_unique) # output True
How to filter every nth row in a dataframe? # Input df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv') # Solution print(df.iloc[::20, :][['Manufacturer', 'Model', 'Type']]) # output Manufacturer Model Type 0 Acura Integra Small 20 Chrysler LeBaron Compact 40 Honda Prelude Sporty 60 Mercury Cougar Midsize 80 Subaru Loyale Small