splitters for reducing dataset sizes

DumbFixedSplitter[source]

DumbFixedSplitter(train_pct)

A splitter that takes the 1st train_pct as the train elements

SubsetPercentageSplitter[source]

SubsetPercentageSplitter(main_splitter, train_pct=0.5, valid_pct=None, randomize=False, seed=None)

Take fixed pct of splits with train_pct and valid_pct from main splitter

Example Usage

from fastai.vision.all import *
mlist = list(range(20)); mlist
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
df_splitter = DumbFixedSplitter(0.8)
t1_train, t2_valid = df_splitter(mlist); t1_train
(#16) [0,1,2,3,4,5,6,7,8,9...]
t2_valid
(#4) [16,17,18,19]
fs_splitter = SubsetPercentageSplitter(df_splitter, randomize=True, seed=42)
ft1_train, ft2_valid = fs_splitter(mlist)
ft1_train
[3, 0, 11, 4, 15, 13, 2, 1]
ft2_valid
[16, 18]
path = untar_data(URLs.MNIST_TINY)
data = DataBlock(
    blocks=(ImageBlock,CategoryBlock),
    get_items=get_image_files,
    get_y=parent_label,
    splitter=SubsetPercentageSplitter(
        GrandparentSplitter(),
        train_pct=0.02,randomize=True, seed=42
    ),
    item_tfms=Resize(28),
    batch_tfms=[]
)
data.summary(path)
Setting-up type transforms pipelines
Collecting items from /Users/butch/.fastai/data/mnist_tiny
Found 1428 items
2 datasets of sizes 14,13
Setting up Pipeline: PILBase.create
Setting up Pipeline: parent_label -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}

Building one sample
  Pipeline: PILBase.create
    starting from
      /Users/butch/.fastai/data/mnist_tiny/train/3/8976.png
    applying PILBase.create gives
      PILImage mode=RGB size=28x28
  Pipeline: parent_label -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
    starting from
      /Users/butch/.fastai/data/mnist_tiny/train/3/8976.png
    applying parent_label gives
      3
    applying Categorize -- {'vocab': None, 'sort': True, 'add_na': False} gives
      TensorCategory(0)

Final sample: (PILImage mode=RGB size=28x28, TensorCategory(0))


Collecting items from /Users/butch/.fastai/data/mnist_tiny
Found 1428 items
2 datasets of sizes 14,13
Setting up Pipeline: PILBase.create
Setting up Pipeline: parent_label -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
Setting up after_item: Pipeline: Resize -- {'size': (28, 28), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} -> ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}

Building one batch
Applying item_tfms to the first sample:
  Pipeline: Resize -- {'size': (28, 28), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} -> ToTensor
    starting from
      (PILImage mode=RGB size=28x28, TensorCategory(0))
    applying Resize -- {'size': (28, 28), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} gives
      (PILImage mode=RGB size=28x28, TensorCategory(0))
    applying ToTensor gives
      (TensorImage of size 3x28x28, TensorCategory(0))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}
    starting from
      (TensorImage of size 4x3x28x28, TensorCategory([0, 1, 1, 1]))
    applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
      (TensorImage of size 4x3x28x28, TensorCategory([0, 1, 1, 1]))
dls = data.dataloaders(path, bs=4)
dls.show_batch()
dls.c
2
len(dls.train), len(dls.valid)
(3, 4)
len(dls.train.items)
14
len(dls.valid.items)
13
learner = cnn_learner(dls, resnet18, metrics=accuracy)
learner.fit(5)
epoch train_loss valid_loss accuracy time
0 1.237793 0.674581 0.692308 00:01
1 1.195532 0.639463 0.692308 00:00
2 1.000510 0.564269 0.923077 00:00
3 0.902393 0.628614 0.461538 00:00
4 0.810910 0.602619 0.615385 00:00