cat.bin | R Documentation |
cat.bin
implements three-stage binning procedure for categorical risk factors.
The first stage is possible correction for minimum percentage of observations.
The second stage is possible correction for target rate (default rate), while the third one is
possible correction for maximum number of bins. Last stage implements procedure known as
adjacent pooling algorithm (APA) which aims to minimize information loss while iterative merging of the bins.
cat.bin( x, y, sc = NA, sc.merge = "none", min.pct.obs = 0.05, min.avg.rate = 0.01, max.groups = NA, force.trend = "modalities" )
x |
Categorical risk factor. |
y |
Numeric target vector (binary). |
sc |
Special case elements. Default value is |
sc.merge |
Define how special cases will be treated. Available options are: |
min.pct.obs |
Minimum percentage of observations per bin. Default is 0.05 or minimum 30 observations. |
min.avg.rate |
Minimum default rate. Default is 0.01 or minimum 1 bad case for |
max.groups |
Maximum number of bins (groups) allowed for analyzed risk factor. If in the first two stages
number of bins is less or equal to selected |
force.trend |
Defines how initial summary table will be ordered. Possible options are: |
The command cat.bin
generates a list of two objects. The first object, data frame summary.tbl
presents a summary table of final binning, while x.trans
is a vector of new grouping values.
Anderson, R. (2007). The credit scoring toolkit: theory and practice for retail credit risk management and decision automation, Oxford University Press
suppressMessages(library(PDtoolkit)) data(loans) #prepare risk factor Purpose for the analysis loans$Purpose <- ifelse(nchar(loans$Purpose) == 2, loans$Purpose, paste0("0", loans$Purpose)) #artificially add missing values in order to show functions' features loans$Purpose[1:6] <- NA #run binning procedure res <- cat.bin(x = loans$Purpose, y = loans$Creditability, sc = NA, sc.merge = "none", min.pct.obs = 0.05, min.avg.rate = 0.05, max.groups = NA, force.trend = "modalities") res[[1]] #check new risk factor against the original table(loans$Purpose, res[[2]], useNA = "always") #repeat the same process with setting max.groups to 4 and force.trend to dr res <- cat.bin(x = loans$Purpose, y = loans$Creditability, sc = NA, sc.merge = "none", min.pct.obs = 0.05, min.avg.rate = 0.05, max.groups = 4, force.trend = "dr") res[[1]] #check new risk factor against the original table(loans$Purpose, res[[2]], useNA = "always") #example of shrinking number of groups for numeric risk factor #copy exisitng numeric risk factor to new called maturity loans$maturity <- loans$"Duration of Credit (month)" #artificially add missing values in order to show functions' features loans$maturity[1:10] <- NA #categorize maturity with MAPA algorithim from monobin package loans$maturity.bin <- cum.bin(x = loans$maturity, y = loans$Creditability, g = 50)[[2]] table(loans$maturity.bin) #run binning procedure to decrease number of bins from the previous step res <- cat.bin(x = loans$maturity.bin, y = loans$Creditability, sc = "SC", sc.merge = "closest", min.pct.obs = 0.05, min.avg.rate = 0.01, max.groups = 5, force.trend = "modalities") res[[1]] #check new risk factor against the original table(loans$maturity.bin, res[[2]], useNA = "always")