Weighted Frequent Itemsets
From Ben's Writing
We modified Christian Borgelt's implementation of the apriori algorithm to incorporate our ideas.
A program to find association rules and frequent item sets (also closed and maximal) with the apriori algorithm (Agrawal et al. 1993), which carries out a breadth first search on the subset lattice and determines the support of itemsets by subset tests. This is a pretty fast implementation that uses a prefix tree to organize the counters for the item sets.
The new usage is as follows:
usage: ./apriori [options] infile outfile [appfile] [whtfile]
Where whtfile is an optional file assigning weights to each item. To this end, a new options flag was added:
- -w — minimal weight of a set/rule/hyperedge (defaulting to zero, if not supplied)
Additional weight related switches can be supplied using -w:
- -f — output format for weight (defaulting to "%.2f", if not supplied),
- -m — weight calculation method (defaulting to zero, if not supplied),
- -s — sampling method (defaulting to zero, if not supplied [i.e. none. See the apriori manual for firther details.]),
- -p — period for sampling (defaulting to five, if not supplied)
For example, the following runs the apriori algorithm on the data contained in test3.tab, using the apperance options in test3.app and the item weigts in test3.wht. The output is printed to stdout. Any given item-set must have a minimum weight of 0.5. The final calculated item-set weight will be printed in the form "%0.2", so a weight of 0.2 will be written as 0.20. Finally, in the example we sample every item in the input.
/apriori -w0.5 -wf'%.2f' -wm1 -wp2 -ws0 test3.tab - test3.app test3.wht
Other sampling methods include:
- uniform
- harmonic
- geometric