Hadoop中的数据库接见_360数据恢复大师收费吗_数据恢复微信数据恢复硬盘开盘硬盘数据恢复手机数据恢复服务器数据恢复数据库数据恢复

Hadoop中的数据库接见_360数据恢复大师收费吗

日期：2015-07-17 / 人气： / 来源：网络

Hadoop主要用来对非结构化或半结构化（HBase）数据进行存储和分析，而结构化的数据则一般使用数据库来进行存储和访问。本文的主要内容则是讲述如何将Hadoop与现有的数据库结合起来，在Hadoop应用程序中访问数据库中的文件。

1.DBInputFormat

DBInputFormat是Hadoop从0.19.0开始支持的一种输入格式，包含在包org.apache.hadoop.mapred.lib.db中，主要用来与现有的数据库系统进行交互，包括MySQL、PostgreSQL、Oracle等几个数据库系统。DBInputFormat在Hadoop应用程序中通过数据库供应商提供的JDBC接口来与数据库进行交互，并且可以使用标准的SQL来读取数据库中的记录。在使用DBInputFormat之前，必须将要使用的JDBC驱动拷贝到分布式系统各个节点的$HADOOP_HOME/lib/目录下。

在DBInputFormat类中包含以下三个内置类：

1．protected class DBRecordReader implements RecordReader<LongWritable, T>：用来从一张数据库表中读取一条条元组记录。

2．public static class NullDBWritable implements DBWritable, Writable：主要用来实现DBWritable接口。

3．protected static class DBInputSplit implements InputSplit：主要用来描述输入元组集合的范围，包括start和end两个属性，start用来表示第一条记录的索引号，end表示最后一条记录的索引号。

其中DBWritable接口与Writable接口比较类似，也包含write和readFields两个函数，只是函数的参数有所不同。DBWritable中的两个函数分别为：

public void write(PreparedStatement statement) throws SQLException;

public void readFields(ResultSet resultSet) throws SQLException;

这两个函数分别用来给java.sql.PreparedStatement设置参数，以及从java.sql.ResultSet中读取一条记录，熟悉Java JDBC用法的应该对这两个类的用法比较了解。

2．使用DBInputFormat读取数据库表中的记录

上文已经对DBInputFormat以及其中的相关内置类作了简单介绍，下面对怎样使用DBInputFormat读取数据库记录进行详细的介绍，具体步骤如下：

1．使用DBConfiguration.configureDB (JobConf job, String driverClass, String dbUrl, String userName, String passwd)函数配置JDBC驱动，数据源，以及数据库访问的用户名和密码。例如MySQL数据库的JDBC的驱动为“com.mysql.jdbc.Driver”，数据源可以设置为“jdbc:mysql://localhost/mydb”，其中mydb可以设置为所需要访问的数据库。

2．使用DBInputFormat.setInput(JobConf job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)函数对要输入的数据进行一些初始化设置，包括输入记录的类名（必须实现了DBWritable接口）、数据表名、输入数据满足的条件、输入顺序、输入的属性列。也可以使用重载的函数setInput(JobConf job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)进行初始化，区别在于后者可以直接使用标准SQL进行初始化，具体可以参考Hadoop API中的讲解。

3．按照普通Hadoop应用程序的格式进行配置，包括Mapper类、Reducer类、输入输出文件格式等，然后调用JobClient.runJob(conf)。

3．使用示例

假设MySQL数据库中有数据库school，其中的teacher数据表定义如下：

DROP TABLE IF EXISTS `school`.`teacher`;

CREATE TABLE `school`.`teacher` (

`id` int(11) default NULL,

`name` char(20) default NULL,

`age` int(11) default NULL,

`departmentID` int(11) default NULL

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

首先给出实现了DBWritable接口的TeacherRecord类：

public class TeacherRecord implements Writable, DBWritable{

int id;

String name;

int age;

int departmentID;

@Override

public void readFields(DataInput in) throws IOException {

// TODO Auto-generated method stub

this.id = in.readInt();

this.name = Text.readString(in);

this.age = in.readInt();

this.departmentID = in.readInt();

}

@Override

public void write(DataOutput out) throws IOException {

// TODO Auto-generated method stub

out.writeInt(this.id);

Text.writeString(out, this.name);

out.writeInt(this.age);

out.writeInt(this.departmentID);

}

@Override

public void readFields(ResultSet result) throws SQLException {

// TODO Auto-generated method stub

this.id = result.getInt(1);

this.name = result.getString(2);

this.age = result.getInt(3);

this.departmentID = result.getInt(4);

}

@Override

public void write(PreparedStatement stmt) throws SQLException {

// TODO Auto-generated method stub

stmt.setInt(1, this.id);

stmt.setString(2, this.name);

stmt.setInt(3, this.age);

stmt.setInt(4, this.departmentID);

}

@Override

public String toString() {

// TODO Auto-generated method stub

return new String(this.name " " this.age " " this.departmentID);

}

利用DBAccessMapper读取一条条记录：

public class DBAccessMapper extends MapReduceBase implements

Mapper<LongWritable, TeacherRecord, LongWritable, Text> {

@Override

public void map(LongWritable key, TeacherRecord value,

OutputCollector<LongWritable, Text> collector, Reporter reporter)

throws IOException {

// TODO Auto-generated method stub

collector.collect(new LongWritable(value.id),

new Text(value.toString()));

}

Main函数如下：

public class DBAccess {

public static void main(String[] args) throws IOException {

JobConf conf = new JobConf(DBAccess.class);

conf.setOutputKeyClass(LongWritable.class);

conf.setOutputValueClass(Text.class);

conf.setInputFormat(DBInputFormat.class);

FileOutputFormat.setOutputPath(conf, new Path("dboutput"));

DBConfiguration.configureDB(conf,"com.mysql.jdbc.Driver",

"jdbc:mysql://localhost/school","root","123456");

String [] fields = {"id", "name", "age", "departmentID"};

DBInputFormat.setInput(conf, TeacherRecord.class, "teacher",

null, "id", fields);

conf.setMapperClass(DBAccessMapper.class);

conf.setReducerClass(IdentityReducer.class);

JobClient.runJob(conf);

}

该示例从teacher表中读取所有记录，并以TextOutputFormat的格式输出到dboutput目录下，输出格式为<”id”, “name age departmentID”>。

4．使用DBOutputFormat向数据库中写记录

DBOutputFormat将计算结果写回到一个数据库，同样先调用DBConfiguration.configureDB（）函数进行数据库配置，然后调用函数DBOutputFormat.setOutput (JobConf job, String tableName, String... fieldNames)进行初始化设置，包括数据库表名和属性列名。同样，在将记录写回数据库之前，要先实现DBWritable接口。每个DBWritable的实例在传递给Reducer中的OutputCollector时都将调用其中的write(PreparedStatement stmt)方法。在Reduce过程结束时，PreparedStatement中的对象将会被转化成SQL语句中的INSERT语句，从而插入到数据库中。

5．总结

DBInputFormat和DBOutputFormat提供了一个访问数据库的简单接口，虽然接口简单，但应用广泛。例如，可以将现有数据库中的数据转储到Hadoop中，由Hadoop进行分布式计算，通过Hadoop对海量数据进行分析，然后将分析后的结果转储到数据库中。在搜索引擎的实现中，可以通过Hadoop将爬行下来的网页进行链接分析，评分计算，建立倒排索引，然后存储到数据库中，通过数据库进行快速搜索。虽然上述的数据库访问接口已经能满足一般的数据转储功能，但是仍然存在一些限制不足，例如并发访问、数据表中的键必须要满足排序要求等，还需Hadoop社区的人员进行改进和优化。

MapReduce中多文件输出的使用的两种方法总结
在Mapreduce 的程序设计中，有时候会遇到多文件输出的使用，目前总结为两种方法：第一种方法：使用MultipleOutputFormat，第二种方式：使用MultipleOutputs。
MapReduce

作者：管理员

江苏飞浩信息科技有限公司

Hadoop中的数据库接见_360数据恢复大师收费吗

推荐内容 Recommended

相关内容 Related

我们的服务 Our Services

我们的成功案例 Our Successful Cases

现在致电4006-2991-90 OR 查看更多联系方式 →

现在致电4006-2991-90 OR 查看更多联系方式 →