Hadoop SequenceFile是Hadoop用于存储二进制键值对的文件格式。它支持存储不同的键值对类型,如:IntWritable/Text, NullWritable/BytesWritable等。
假设我的sequence file的key是BooleanWritable类型,value是Text类型,怎么读取它呢?
话不多说,直接上代码:
Configuration conf = new Configuration();
FileSystem fs = FileSystem.getLocal(conf);
Path seqFilePath = new Path("/path/to/your/sequence-file");
{
// 打印出sequence file的key和value的类型
SequenceFile.Reader reader = new SequenceFile.Reader(fs, seqFilePath, conf);
Writable key = (Writable) reader.getKeyClass().newInstance();
Writable value = (Writable) reader.getValueClass().newInstance();
while (reader.next(key, value)) {
System.out.println("Key type: " + key.getClass());
System.out.println("Value type: " + value.getClass());
break;
}
reader.close();
}
{
// 打印出sequence file的内容
SequenceFile.Reader reader = new SequenceFile.Reader(fs, seqFilePath), conf);
BooleanWritable keyObj = new BooleanWritable();
Text valueObj = new Text();
while (reader.next(keyObj, valueObj)) {
System.out.println(keyObj.get() + ": " + valueObj);
}
reader.close();
}
输出类似于:
Key type: class org.apache.hadoop.io.BooleanWritableValue type: class org.apache.hadoop.io.Text
(文件内容此处省略)
Maven依赖取决于你使用的Hadoop版本,下面是一个例子:
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>3.2.1</version>
</dependency>
</dependencies>
文章来源:https://www.codelast.com/
➤➤ 关注不迷路 ➤➤
感谢关注我的微信公众号(微信扫一扫):
以及我的微信视频号: