為 Mahout 增加聚類(lèi)評(píng)估功能

當(dāng)前位置：首頁(yè) > 范文|應(yīng)用文 > IT技術(shù)專(zhuān)欄 > 技術(shù)文章

來(lái)源：易賢網(wǎng) 閱讀：2489 次日期：2015-04-03 11:08:57

溫馨提示：易賢網(wǎng)小編為您整理了“為 Mahout 增加聚類(lèi)評(píng)估功能”,方便廣大網(wǎng)友查閱！

聚類(lèi)算法及聚類(lèi)評(píng)估 Silhouette 簡(jiǎn)介

聚類(lèi)算法簡(jiǎn)介

聚類(lèi)（clustering）是屬于無(wú)監(jiān)督學(xué)習(xí)（Unsupervised learning）的一種，用來(lái)把一組數(shù)據(jù)劃分為幾類(lèi)，每類(lèi)中的數(shù)據(jù)盡可能的相似，而不同類(lèi)之間盡可能的差異最大化。通過(guò)聚類(lèi)，可以為樣本選取提供參考，或進(jìn)行根源分析，或作為其它算法的預(yù)處理步驟。

聚類(lèi)算法中，最經(jīng)典的要屬于 Kmeans 算法，它的基本思想是：假設(shè)我們要把一組數(shù)據(jù)聚成 N 類(lèi)，那就：

把數(shù)據(jù)中的每個(gè)樣本作為一個(gè)向量，記作Ā

首先隨機(jī)選取 n 個(gè)樣本，把這 n 個(gè)樣本作為 N 類(lèi)的中心點(diǎn), 稱(chēng)為 centroid

針對(duì)數(shù)據(jù)中的所有樣本，計(jì)算到 n 個(gè) centroid 的距離，距離哪個(gè)中心點(diǎn)最近，就屬于哪一類(lèi)

在每一類(lèi)中，重新選取 centroid，假設(shè)該類(lèi)有 k 個(gè)樣本，則 centroid 為

重復(fù) 2，3 直到 centroid 的變化小于預(yù)設(shè)的值。

Mahout 是一個(gè)開(kāi)源的機(jī)器學(xué)習(xí)軟件，提供了應(yīng)用推薦、聚類(lèi)、分類(lèi)、Logistic 回歸分析等算法。特別是由于結(jié)合了 Hadoop 的大數(shù)據(jù)處理能力，每個(gè)算法都可以作為獨(dú)立的 job 方便的部署在 Hadoop 平臺(tái)上，因此得到了越來(lái)越廣的應(yīng)用。在聚類(lèi)領(lǐng)域，Mahout 提供了 Kmeans，LDA, Canopy 等多種算法。

聚類(lèi)評(píng)估算法 Silhouette 簡(jiǎn)介

在 Kmeans 中，我們會(huì)注意到需要我們預(yù)先設(shè)置聚合成幾類(lèi)。實(shí)際上，在聚類(lèi)的過(guò)程中我們也不可能預(yù)先知道，那只能分成 2 類(lèi)，3 類(lèi)，……n 類(lèi)這樣進(jìn)行嘗試，并評(píng)估每次的聚類(lèi)效果。

實(shí)際上，由于聚類(lèi)的無(wú)監(jiān)督學(xué)習(xí)特性，無(wú)論什么算法都需要評(píng)估效果。在聚類(lèi)的評(píng)估中，有基于外部數(shù)據(jù)的評(píng)價(jià)，也有單純的基于聚類(lèi)本身的評(píng)價(jià)，其基本思想就是：在同一類(lèi)中，各個(gè)數(shù)據(jù)點(diǎn)越近越好，并且和類(lèi)外的數(shù)據(jù)點(diǎn)越遠(yuǎn)越好；前者稱(chēng)為內(nèi)聚因子（cohension），后者稱(chēng)為離散因子（separation）。

把這兩者結(jié)合起來(lái)，就形成了評(píng)價(jià)聚類(lèi)效果的 Silhouette 因子:

首先看如何評(píng)價(jià)一個(gè)點(diǎn)的聚類(lèi)效果：

a = 一個(gè)點(diǎn)到同一聚類(lèi)內(nèi)其它點(diǎn)的平均距離

b=min（一個(gè)點(diǎn)到其他聚類(lèi)內(nèi)的點(diǎn)的平均距離）

Silhouette 因子s = 1 – a/b (a<b) 或b/a -1 (a>=b)

衡量整體聚類(lèi)的效果，則是所有點(diǎn)的 Silhouette 因子的平均值。范圍應(yīng)該在 (-1,1), 值越大則說(shuō)明聚類(lèi)效果越好。

圖 1.Silhouette 中內(nèi)聚、離散因子示意

以圖 1 為例。圖 1 顯示的是一個(gè)具有 9 個(gè)點(diǎn)的聚類(lèi)，三個(gè)圓形表示聚成了三類(lèi)，其中的黃點(diǎn)表示質(zhì)心（centroid）。為了評(píng)估圖 1 中深藍(lán)色點(diǎn)的聚類(lèi)效果，其內(nèi)聚因子a就是該點(diǎn)到所在圓中其它三個(gè)點(diǎn)的平均距離。離散因子b的計(jì)算相對(duì)復(fù)雜：我們需要先求出到該點(diǎn)到右上角圓中的三個(gè)點(diǎn)的平均距離，記為 b1；然后求出該點(diǎn)到右下角圓中兩個(gè)點(diǎn)的平均距離，記為 b2；b1 和 b2 的較小值則為b。

在 IBM 的 SPSS Clementine 中，也有 Silhouett 評(píng)估算法的實(shí)現(xiàn)，不過(guò) IBM 提供的是一個(gè)簡(jiǎn)化版本，把一個(gè)點(diǎn)到一個(gè)類(lèi)內(nèi)的距離的平均值，簡(jiǎn)化為到該類(lèi)質(zhì)心（centroid）的距離，具體來(lái)說(shuō)，就是：

圖 2.IBM 關(guān)于內(nèi)聚、離散因子的簡(jiǎn)化實(shí)現(xiàn)

還是以上面描述的 9 個(gè)點(diǎn)聚成 3 類(lèi)的例子來(lái)說(shuō)明。IBM 的實(shí)現(xiàn)把a(bǔ)的實(shí)現(xiàn)簡(jiǎn)化為到深藍(lán)色的點(diǎn)所在的質(zhì)心的距離。計(jì)算b時(shí)候，還是要先計(jì)算 b1 和 b2，然后求最小值。但 b1 簡(jiǎn)化為到右上角圓質(zhì)心的距離；b2 簡(jiǎn)化為到右下角圓質(zhì)心的距離。

在下面的內(nèi)容中，我們嘗試?yán)?IBM 簡(jiǎn)化后的公式為 Mahout 增加聚類(lèi)評(píng)估功能。

Mahout 聚類(lèi)過(guò)程分析

Mahout 運(yùn)行環(huán)境簡(jiǎn)介

前面說(shuō)過(guò)，Mahout 是依賴(lài) Hadoop 環(huán)境，每一個(gè)算法或輔助功能都是作為 Hadoop 的一個(gè)單獨(dú)的 job 來(lái)運(yùn)行，所以必須準(zhǔn)備好一個(gè)可運(yùn)行的 Hadoop 環(huán)境，（至少本文寫(xiě)作時(shí)候使用的 Mahout0.9 還在依賴(lài) Hadoop），如何安裝配置一個(gè)可運(yùn)行的 Hadoop 環(huán)境不在這篇文章的介紹范圍內(nèi)。請(qǐng)自行參考 Hadoop 網(wǎng)站。需要說(shuō)明的是，本文采用的 Hadoop 為 2.2.0。

安裝完 Hadoop 后，下載 mahout-distribution-0.9，解壓縮后的重要內(nèi)容如下：

bin/: 目錄下有 Mahout 可執(zhí)行腳本

mahout-examples-0.9-job.jar，各種算法的實(shí)現(xiàn)類(lèi)

example/ 各種實(shí)現(xiàn)算法的源碼

conf/ 存放各實(shí)現(xiàn)類(lèi)的配置文件，其中重要的為 driver.classes.default.props，如果增加實(shí)現(xiàn)算法類(lèi)，可以在該文件中增加配置項(xiàng)，從而可以被 Mahout 啟動(dòng)腳本調(diào)用。

單獨(dú)執(zhí)行 Mahout，是一個(gè)實(shí)現(xiàn)的各種功能的簡(jiǎn)介，如下例：

執(zhí)行 /data01/shanlei/src/mahout-distribution-0.9/bin/mahout

輸出：

MAHOUT_LOCAL is not set; adding

HADOOP_CONF_DIR to classpath.

Running on hadoop, using /data01/shanlei/hadoop-2.2.0/bin/hadoop and

HADOOP_CONF_DIR=/data01/shanlei/hadoop-2.2.0/conf

p1 is org.apache.mahout.driver.MahoutDriver

MAHOUT-JOB:

/data01/shanlei/src/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar

An example program must be given as the first argument.

Valid program names are:

arff.vector: : Generate Vectors from an ARFF file or directory

assesser: : assesse cluster result using silhoueter algorithm

baumwelch: : Baum-Welch algorithm for unsupervised HMM training

canopy: : Canopy clustering

cat: : Print a file or resource as the logistic regression models would see it

cleansvd: : Cleanup and verification of SVD output

clusterdump: : Dump cluster output to text

clusterpp: : Groups Clustering Output In Clusters

cmdump: : Dump confusion matrix in HTML or text formats

concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix

……

如果要執(zhí)行某種算法，如上面結(jié)果中顯示的 canopy，就需要執(zhí)行 mahout canopy 加上該算法需要的其它參數(shù)。

另外，Mahout 算法的輸入輸出，都是在 Hadoop HDFS 上，因此需要通過(guò) hdfs 命令上傳到 hdfs 文件系統(tǒng)；輸出大多為 Mahout 特有的二進(jìn)制格式，需要通過(guò) mahout seqdumper 等命令來(lái)導(dǎo)出并轉(zhuǎn)換為可讀文本。

準(zhǔn)備輸入

Mahout 算法使用的 input 需要特定格式的 Vector 文件，不能夠直接使用一般的文本文件，因此需要把文本轉(zhuǎn)換為 Vector 文件，好在 Mahout 自身提供了這樣的類(lèi)：

org.apache.mahout.clustering.conversion.InputDriver。

在 Mahout 的 conf 目錄中的 driver.classes.default.props 增加如下行：

org.apache.mahout.clustering.conversion.InputDriver = input2Seq : create sequence file from blank separated files，然后就可以為 Mahout 增加一個(gè)功能，把空格分隔的文本文件轉(zhuǎn)換為 Mahout 聚類(lèi)可以使用的向量。

如下面的數(shù)據(jù)所示，該數(shù)據(jù)每行為一個(gè)包含 6 個(gè)屬性的向量：

1 4 3 11 4 3

2 2 5 2 10 3

1 1 2 2 10 1

1 4 2 11 5 4

1 1 3 2 10 1

2 4 5 9 5 2

2 6 5 3 8 1

執(zhí)行 ./mahout input2Seq -i /shanlei/userEnum -o /shanlei/vectors 則產(chǎn)生聚類(lèi)需要的向量文件。

聚類(lèi)

以 Kmeans 聚類(lèi)為例：

./mahout kmeans --input /shanlei/vectors --output /shanlei/kmeans -c /shanlei/k --maxIter 5 -k 8 –cl

-k 8 指明產(chǎn)生 8 類(lèi)，執(zhí)行完成后，在/shanlei/kmeans/下會(huì)產(chǎn)生： clusters-0，clusters-1，… …,clusters-n-final 目錄，每個(gè)目錄都是一次迭代產(chǎn)生的 centroids, 目錄數(shù)會(huì)受 --maxIter 控制；最后的結(jié)果會(huì)加上 final。

利用 Mahout 的 clusterdump 功能我們可以查看聚類(lèi)的結(jié)果：

./mahout clusterdump -i /shanlei/kmeans/clusters-2-final -o ./centroids.txt

more centroids.txt：

VL-869{n=49 c=[1.163, 5.082, 4.000,

4.000, 4.592, 2.429] r=[0.370, 0.965, 1.030, 1.245, 1.244, 1.161]}

VL-949{n=201 c=[1.229, 4.458, 4.403,

10.040, 6.134, 1.458] r=[0.420, 1.079, 0.836, 1.196, 1.392, 0.852]}

… …

VL-980{n=146 c=[1.281, 2.000, 4.178,

2.158, 9.911, 1.918] r=[0.449, 0.712, 1.203, 0.570, 0.437, 1.208]}

VL-869 中的 869 為該類(lèi)的 id，c=[1.163, 5.082, 4.000, 4.000, 4.592, 2.429] 為 centroid 的坐標(biāo)，n=49 表示該類(lèi)中數(shù)據(jù)點(diǎn)的個(gè)數(shù)。

如果使用-cl 參數(shù)，則在/shanlei/kmeans/下會(huì)產(chǎn)生 clusteredPoints，利用 Mahout 的 seqdumper 可以看其內(nèi)容：

Input Path:

hdfs://rac122:18020/shanlei/kmeans/clusteredPoints/part-m-00000

Key class: class

org.apache.hadoop.io.IntWritable Value Class: class

org.apache.mahout.clustering.classify.WeightedPropertyVectorWri

table

Key: 301: Value: wt: 1.0 distance:

6.6834472852629006 vec: 1 = [1.000, 4.000, 3.000, 11.000, 4.000, 3.000]

Key: 980: Value: wt: 1.0 distance:

2.3966504034528384 vec: 2 = [2.000, 2.000, 5.000, 2.000, 10.000, 3.000]

Key 對(duì)應(yīng)的則是相關(guān)聚類(lèi)的 id，distance 為到 centroid 的距離。vec 則是原始的向量。

從 Mahout 的聚類(lèi)輸出結(jié)果來(lái)看，能夠很容易的實(shí)現(xiàn) IBM 簡(jiǎn)化后的 Silhouette 算法，內(nèi)聚因子 (a) 可以簡(jiǎn)單的獲取到，而離散因子 (b) 也能夠簡(jiǎn)單的計(jì)算實(shí)現(xiàn)。下面我們就來(lái)設(shè)計(jì) Mahout 中的實(shí)現(xiàn)。

Mahout 中 Silhouette 實(shí)現(xiàn)

算法設(shè)計(jì)：

遵循 Hadoop 上 MR 程序的設(shè)計(jì)原則，算法設(shè)計(jì)考慮了 mapper，reducer 及 combiner 類(lèi)。

Mapper 設(shè)計(jì)：

輸入目錄：聚類(lèi)的最終結(jié)果目錄 clusteredPoints（通過(guò)命令行參數(shù)-i 設(shè)置），

輸入：

Key：IntWritable，Value：WeightedPropertyVectorWritable

輸出：

Key：IntWritable（無(wú)意義，常量 1），Value：Text（單個(gè)點(diǎn)的 Silhouette 值，格式為“cnt，Silhouette 值”）

Setup 過(guò)程：

因?yàn)樾枰?jì)算 separation 時(shí)候要訪問(wèn)其它的 centroids，所以在 setup 中讀?。ㄍㄟ^(guò)命令行參數(shù)-c 設(shè)置）并緩存。

Map 過(guò)程：

由于輸入的 Value 為 WeightedPropertyVectorWritable，可以通過(guò)訪問(wèn)字段 distance獲得參數(shù) a，并遍歷緩存的 centroids，針對(duì)其 id 不等于 Key 的，逐一計(jì)算距離，其最小的就是參數(shù) b。

Map 的結(jié)果 Key 使用常量 1，Value 為形如“1，0.23”這樣的“cnt，Silhouette 值”格式。

Reducer 設(shè)計(jì)：

輸入：

Key：IntWritable（常量 1），Value: Text (combine 后的中間 Silhouette 值，格式為“cnt，Silhouette 值”)。

輸出：

Key：IntWritable（常量 1），Value:整個(gè)聚類(lèi)的 Silhouette 值，格式為“cnt，Silhouette 值”。

輸出目錄：最終文件的產(chǎn)生目錄，通過(guò)命令行參數(shù)-o 設(shè)置。

Reduce 過(guò)程：

根據(jù)“，”把每個(gè) Value，分解為 cnt，和 Silhouette，最后進(jìn)行加權(quán)平均。

Combiner 設(shè)計(jì)：

為減少數(shù)據(jù)的 copy，采用 combiner，其實(shí)現(xiàn)即為 reducer 的實(shí)現(xiàn)。

實(shí)現(xiàn)代碼：

Mapper 類(lèi)：

public class AssesserMapper extends Mapper<IntWritable,

WeightedPropertyVectorWritable, IntWritable, Text> {

private List<Cluster> clusterModels;

private static final Logger log = LoggerFactory.getLogger(ClusterAssesser.class);

protected void setup(Context context) throws IOException, InterruptedException {

super.setup(context);

Configuration conf = context.getConfiguration();

String clustersIn = conf.get(ClusterClassificationConfigKeys.CLUSTERS_IN);

clusterModels = Lists.newArrayList();

if (clustersIn != null && !clustersIn.isEmpty()) {

Path clustersInPath = new Path(clustersIn);

clusterModels = populateClusterModels(clustersInPath, conf);

}

private static List<Cluster> populateClusterModels(Path clustersIn,

Configuration conf) throws IOException {

List<Cluster> clusterModels = Lists.newArrayList();

Path finalClustersPath = finalClustersPath(conf, clustersIn);

Iterator<?> it = new SequenceFileDirValueIterator<Writable>(finalClustersPath, PathType.LIST,

PathFilters.partFilter(), null, false, conf);

while (it.hasNext()) {

ClusterWritable next = (ClusterWritable) it.next();

Cluster cluster = next.getValue();

cluster.configure(conf);

clusterModels.add(cluster);

}

return clusterModels;

}

private static Path finalClustersPath(Configuration conf,

Path clusterOutputPath) throws IOException {

FileSystem fileSystem = clusterOutputPath.getFileSystem(conf);

FileStatus[] clusterFiles = fileSystem.listStatus(clusterOutputPath,

PathFilters.finalPartFilter());

log.info("files: {}", clusterOutputPath.toString());

return clusterFiles[0].getPath();

}

protected void map(IntWritable key, WeightedPropertyVectorWritable vw, Context context)

throws IOException, InterruptedException {

int clusterId=key.get();

double cohension,separation=-1,silhouete;

Map<Text,Text> props=vw.getProperties();

cohension=Float.valueOf(props.get(new Text("distance")).toString());

Vector vector = vw.getVector();

for ( Cluster centroid : clusterModels) {

if (centroid.getId()!=clusterId) {

DistanceMeasureCluster distanceMeasureCluster = (DistanceMeasureCluster) centroid;

DistanceMeasure distanceMeasure = distanceMeasureCluster.getMeasure();

double f = distanceMeasure.distance(centroid.getCenter(), vector);

if (f<separation || separation<-0.5) separation=f;

}

Text value=new Text(Long.toString(1)+","+Double.toString(silhouete));

IntWritable okey=new IntWritable();

okey.set(1);

context.write(okey, value);

}

Reducer 類(lèi)：

public class AssesserReducer extends Reducer<IntWritable, Text, IntWritable, Text> {

private static final Logger log = LoggerFactory.getLogger(ClusterAssesser.class);

protected void setup(Context context) throws IOException, InterruptedException {

super.setup(context);

log.info("reducer");

}

private static final Pattern SEPARATOR = Pattern.compile("[t,]");

public void reduce(IntWritable key, Iterable<Text> values,

Context context)

throws IOException, InterruptedException {

long cnt=0;

double total=0;

for (Text value : values) {

String[] p=SEPARATOR.split(value.toString());

Long itemCnt=Long.parseLong(p[0]);

double v=Double.parseDouble(p[1]);

total=total+ itemCnt*v;

cnt=cnt+itemCnt;

}

Job 類(lèi)：

public class ClusterAssesser extends AbstractJob {

private ClusterAssesser() {

}

public int run(String[] args) throws Exception {

addInputOption();

addOutputOption();

//addOption(DefaultOptionCreator.methodOption().create());

addOption(DefaultOptionCreator.clustersInOption()

.withDescription("The input centroids").create());

if (parseArguments(args) == null) {

return -1;

}

Path input = getInputPath();

Path output = getOutputPath();

Path clustersIn = new Path(getOption(DefaultOptionCreator.CLUSTERS_IN_OPTION));

if (getConf() == null) {

setConf(new Configuration());

}

run(getConf(), input, clustersIn, output);

return 0;

}

private void run(Configuration conf, Path input, Path clustersIn,

Path output)throws IOException, InterruptedException,

ClassNotFoundException {

conf.set(ClusterClassificationConfigKeys.CLUSTERS_IN, clustersIn.toUri().toString());

Job job = new Job(conf, "Cluster Assesser using silhouete over input: " + input);

job.setJarByClass(ClusterAssesser.class);

job.setInputFormatClass(SequenceFileInputFormat.class);

job.setOutputFormatClass(SequenceFileOutputFormat.class);

job.setMapperClass(AssesserMapper.class);

job.setCombinerClass(AssesserReducer.class);

job.setReducerClass(AssesserReducer.class);

job.setNumReduceTasks(1);

job.setOutputKeyClass(IntWritable.class);

job.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(job, input);

FileOutputFormat.setOutputPath(job, output);

if (!job.waitForCompletion(true)) {

throw new InterruptedException("Cluster Assesser Job failed processing " + input);

}

private static final Logger log = LoggerFactory.getLogger(ClusterAssesser.class);

public static void main(String[] args) throws Exception {

ToolRunner.run(new Configuration(), new ClusterAssesser(), args);

}

編譯運(yùn)行：

編譯環(huán)境準(zhǔn)備:

在從 Mahout 網(wǎng)站下載的包中，同時(shí)包含了源碼以及可以導(dǎo)入到 eclipse 的工程，導(dǎo)入后，會(huì)產(chǎn)生 mahout-core，mahout-distribution，mahout-example 等不同的 projects，我們首先編譯一遍，保證沒(méi)有錯(cuò)誤，然后再考慮如何增加自己的代碼。

當(dāng)然，Mahout 在頂層目錄也提供了一個(gè)編譯腳本：compile.sh, 可以在命令行完成編譯。

代碼編譯：

把自己的代碼放到 example/src/main/java/目錄下，自動(dòng)編譯就可以了。輸出產(chǎn)生的類(lèi)：com.ai.cluster.assesser.ClusterAssesser，然后就被打包到了 examples/target/mahout-examples-0.9-job.jar 中。

配置：

把 examples/target/mahout-examples-0.9-job.jar 覆蓋頂層的 mahout-examples-0.9-job.jar

通過(guò)在 conf/driver.classes.default.props 文件添加如下行，把我們的實(shí)現(xiàn)類(lèi)加入到 Mahout 的配置中，從而可以通過(guò) Mahout 腳本執(zhí)行：

com.ai.cluster.assesser.ClusterAssesser = assesser : assesse cluster result using silhoueter algorithm

運(yùn)行

利用前面我們做聚類(lèi)過(guò)程分析產(chǎn)生的聚類(lèi)結(jié)果：

bin/mahout assesser -i /shanlei/kmeans/clusteredPoints -o /shanlei/silhouete -c

/shanlei/kmeans --tempDir /shanlei/temp

其中的-c 為輸入聚類(lèi)的中心點(diǎn)，-i 為聚類(lèi)的點(diǎn) –o 為最終的輸出。

查看結(jié)果：

bin/mahout seqdumper -i /shanlei/silhouete -o ./a.txt

more a.txt:

Input Path: hdfs://rac122:18020/shanlei/silhouete/part-r-00000

Key class: class org.apache.hadoop.io.IntWritable Value Class: class org.apache.hadoop.io.Text

Key: 1: Value: 1000,0.5217678842906524

Count: 1

1000 表示共 1000 個(gè)點(diǎn)，0.52176 為聚類(lèi)的 Silhouette 值。大于 0.5，看起來(lái)效果還行。

結(jié)束語(yǔ)：

不同于其它的套件，Mahout 從發(fā)布起就是為處理海量數(shù)據(jù)、為生產(chǎn)而準(zhǔn)備的。直到現(xiàn)在，Mahout 的重心還是在優(yōu)化各種算法上面，對(duì)易用性考慮不多，而且學(xué)習(xí)成本也很高。但 Mahout 不僅僅提供某些特定的算法，而且還把前期準(zhǔn)備中的數(shù)據(jù)清洗，轉(zhuǎn)換，以及后續(xù)的效果評(píng)估、圖形化展現(xiàn)都集成在一塊，方便用戶。這不僅是一種發(fā)展趨勢(shì)，也是爭(zhēng)取用戶的一個(gè)關(guān)鍵因素。希望大家都能夠加入進(jìn)來(lái)，提供各種各樣的輔助功能，讓 Mahout 變得易用起來(lái)。

更多信息請(qǐng)查看IT技術(shù)專(zhuān)欄

更多信息請(qǐng)查看技術(shù)文章

上一篇：讓 Nginx 支持 HTML5 上傳

下一篇：VizJSON：JSON 中的圖形語(yǔ)法

易賢網(wǎng)手機(jī)網(wǎng)站地址：為 Mahout 增加聚類(lèi)評(píng)估功能

由于各方面情況的不斷調(diào)整與變化，易賢網(wǎng)提供的所有考試信息和咨詢回復(fù)僅供參考，敬請(qǐng)考生以權(quán)威部門(mén)公布的正式信息和咨詢?yōu)闇?zhǔn)！

相關(guān)閱讀技術(shù)文章

ado.net數(shù)據(jù)庫(kù)訪問(wèn)技術(shù)11月15日

數(shù)據(jù)庫(kù)同步優(yōu)化技巧分享11月15日

詳解sqlite中的查詢規(guī)劃器11月15日

sqlite快速入門(mén)指南11月14日

簡(jiǎn)單分析sqlite4的一些設(shè)計(jì)改變11月14日