목록전체 글 (462)
Note
How to get the mean of a series grouped by another series? # Input fruit = pd.Series(np.random.choice(['apple', 'banana', 'carrot'], 10)) weights = pd.Series(np.linspace(1, 10, 10)) # Solution weights.groupby(fruit).mean() # output apple 7.4 banana 2.0 carrot 6.0 dtype: float64
How to import a dataset with numbers and texts keeping the text intact in python numpy? # Solution url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data' iris = np.genfromtxt(url, delimiter=',', dtype='object') names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species') # Print the first 3 rows iris[:3] # output array([[b'5.1', b'3.5', b'1.4', b'0.2', ..
How to filter valid emails from a series? # Input emails = pd.Series(['buying books at amazom.com', 'rameses@egypt.com', 'matt@t.co', 'narendra@modi.com']) # 1 (as series of strings) import re pattern ='[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}' mask = emails.map(lambda x: bool(re.match(pattern, x))) emails[mask] # 2 (as series of list) emails.str.findall(pattern, flags=re.IGNORECASE) # 3..
How to print the full numpy array without truncating # Input np.set_printoptions(threshold=6) a = np.arange(15) # Solution np.set_printoptions(threshold=np.nan) a # output array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
How to filter words that contain atleast 2 vowels from a series? # Input ser = pd.Series(['Apple', 'Orange', 'Plan', 'Python', 'Money']) # Solution from collections import Counter mask = ser.map(lambda x: sum([Counter(x.lower()).get(i, 0) for i in list('aeiou')]) >= 2) ser[mask] # output 0 Apple 1 Orange 4 Money dtype: object