【大数据实验】熟悉常用的 HBase 操作

实验目的理解 HBase 在 Hadoop 体系结构中的角色熟练使用 HBase 操作常用的 Shell 命令熟悉 HBase操作常用的 Java API实验平台操作系统：Ubuntu 16.04Hadoop 版本：3.1.3HBase 版本：2.2.2JDK 版本：1.8Java IDE：Eclipse实验内容和要求1.编程实现以下指定功能，并用 Hadoop 提供的 HBase Shell 命

AinD

24396人浏览 · 2020-05-10 20:55:18

AinD · 2020-05-10 20:55:18 发布

文章目录

实验目的
实验平台
实验内容和要求

实验目的

理解 HBase 在 Hadoop 体系结构中的角色
熟练使用 HBase 操作常用的 Shell 命令
熟悉 HBase操作常用的 Java API

实验平台

操作系统：Ubuntu 16.04
Hadoop 版本：3.1.3
HBase 版本：2.2.2
JDK 版本：1.8
Java IDE：Eclipse

注：实验需要开启hbase服务，开启顺序为先Hadoop → Hbase，关闭顺序为Hbase → Hadoop

实验内容和要求

1. 编程实现以下指定功能，并用 Hadoop 提供的 HBase Shell 命令完成相同任务：

（1）列出 HBase 所有的表的相关信息，例如表名

hbase(main):001:0> list

（2）在终端打印出指定的表的所有记录数据

查看记录数据:
hbase(main):001:0> scan '表名'

查看表的信息：
hbase(main):001:0> describe '表名'

（3）向已经创建好的表添加和删除指定的列族或列

添加列族或列：
hbase(main):001:0> alter '表名','NAME'=>'列名'

删除列族或列：
hbase(main):001:0> alter '表名','NAME'=>'列名',METHOD=>'delete'

（4）清空指定的表的所有记录数据

hbase(main):001:0> drop '表名'

（5）统计表的行数

hbase(main):001:0> count '表名'

2.现有以下关系型数据库中的表和数据，要求将其转换为适合于 HBase 存储的表并插入数据：

学生表：

学号	姓名	性别	年龄
2015001	Zhangsan	male	23
2015002	Mary	female	22
2015003	Lisi	male	24

课程表：

课程号	课程名	学分
123001	Math	2.0
123002	Computer Science	5.0
123003	English	3.0

选课表：

学号	课程号	成绩
2015001	123001	86
2015001	123003	69
2015002	123002	77
2015002	123003	99
2015003	123001	98
2015003	123002	95

创建三个表格

‘Student’表中添加数据

‘Course’表中添加数据

‘SC’表中添加数据

同时，请编程完成以下指定功能：

（1）createTable(String tableName, String[] fields)

创建表，参数 tableName 为表的名称，字符串数组 fields 为存储记录各个域名称的数组。要求当 HBase 已经存在名为 tableName 的表的时候，先删除原有的表，然后再创建新的表。

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptor;
import org.apache.hadoop.hbase.client.ColumnFamilyDescriptorBuilder;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.TableDescriptorBuilder;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;

public class CreateTable {
	public static Configuration configuration;
	public static Connection connection;
	public static Admin admin;
	
	public static void init(){//建立连接
		configuration = HBaseConfiguration.create();
		configuration.set("hbase.rootdir","hdfs://localhost:9000/hbase");
		try{
			connection = ConnectionFactory.createConnection(configuration);
			admin = connection.getAdmin();
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void close(){//关闭连接
		try{
			if(admin != null){
				admin.close();
			}
			if(connection != null){
				connection.close();
			}
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void createTable(String tableName,String[] fields) throws IOException{
		init();
		TableName tablename = TableName.valueOf(tableName);//定义表名
		if(admin.tableExists(tablename)){
			System.out.println("table is exists!");
			admin.disableTable(tablename);
			admin.deleteTable(tablename);
		}
		TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(tablename);
		for(int i=0;i<fields.length;i++){
			ColumnFamilyDescriptor family = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(fields[i])).build();
			tableDescriptor.setColumnFamily(family);
		}
		admin.createTable(tableDescriptor.build());
		close();
	}
	public static void main(String[] args){
		String[] fields = {"id","score"};
		try{
			createTable("test",fields);
		}catch(IOException e){
			e.printStackTrace();
		}
	}
}

第一次创建不存在的表
从hbase中可以查询到刚创建的test表
再次运行代码，显示test表已存在，删除原表并重新创建

（2）addRecord(String tableName, String row, String[] fields, String[] values)

向表 tableName、行 row（用 S_Name 表示）和字符串数组 fields 指定的单元格中添加对应的数据 values。其中 fields 中每个元素如果对应的列族下还有相应的列限定符的话，用 “columnFamily:column”表示。例如，同时向“Math”、“Computer Science”、“English” 三列添加成绩时，字符串数组 fields 为{“Score:Math”,“Score:Computer Science”, “Score:English”}，数组 values 存储这三门课的成绩。

当向表添加数据时，我们需要一个Put对象，在Put对象之前我们需要获取Table对象，这样才能对指定的表进行操作

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;


public class addRecord {
	public static Configuration configuration;
	public static Connection connection;
	public static Admin admin;
	
	public static void init(){//建立连接
		configuration = HBaseConfiguration.create();
		configuration.set("hbase.rootdir","hdfs://localhost:9000/hbase");
		try{
			connection = ConnectionFactory.createConnection(configuration);
			admin = connection.getAdmin();
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void close(){//关闭连接
		try{
			if(admin != null){
				admin.close();
			}
			if(connection != null){
				connection.close();
			}
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void addRecord(String tableName,String row,String[] fields,String[] values) throws IOException{
		init();//连接Hbase
		Table table = connection.getTable(TableName.valueOf(tableName));//表连接
		Put put = new Put(row.getBytes());//创建put对象
		for(int i=0;i<fields.length;i++){
			String[] cols = fields[i].split(":");
			if(cols.length == 1){
				put.addColumn(fields[i].getBytes(),"".getBytes(),values[i].getBytes());
			}
			else{
				put.addColumn(cols[0].getBytes(),cols[1].getBytes(),values[i].getBytes());
			}
			table.put(put);//向表中添加数据
		}
		close();//关闭连接
	}
	public static void main(String[] args){
		String[] fields = {"Score:Math","Score:Computer Science","Score:English"};
		String[] values = {"90","90","90"};
		try{
			addRecord("grade","S_Name",fields,values);
		}catch(IOException e){
			e.printStackTrace();
		}
	}
}

由于没有创建这个表，所以只贴出代码，不再实验。

（3）scanColumn(String tableName, String column)

浏览表 tableName 某一列的数据，如果某一行记录中该列数据不存在，则返回 null。要求当参数 column 为某一列族名称时，如果底下有若干个列限定符，则要列出每个列限定符代表的列的数据；当参数 column 为某一列具体名称（例如“Score:Math”）时，只需要列出该列的数据。

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;


public class scanColumn {
	public static Configuration configuration;
	public static Connection connection;
	public static Admin admin;
	
	public static void init(){//建立连接
		configuration = HBaseConfiguration.create();
		configuration.set("hbase.rootdir","hdfs://localhost:9000/hbase");
		try{
			connection = ConnectionFactory.createConnection(configuration);
			admin = connection.getAdmin();
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void close(){//关闭连接
		try{
			if(admin != null){
				admin.close();
			}
			if(connection != null){
				connection.close();
			}
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void showResult(Result result){
		Cell[] cells = result.rawCells();
		for(int i=0;i<cells.length;i++){
			System.out.println("RowName:"+new String(CellUtil.cloneRow(cells[i])));//打印行键
			System.out.println("ColumnName:"+new String(CellUtil.cloneQualifier(cells[i])));//打印列名
			System.out.println("Value:"+new String(CellUtil.cloneValue(cells[i])));//打印值
			System.out.println("Column Family:"+new String(CellUtil.cloneFamily(cells[i])));//打印列簇
			System.out.println();
		}
	}
	public static void scanColumn(String tableName,String column){
		init();
		try {
			Table table = connection.getTable(TableName.valueOf(tableName));
			Scan scan  = new Scan();
			scan.addFamily(Bytes.toBytes(column));
			ResultScanner scanner = table.getScanner(scan);
			for(Result result = scanner.next();result != null;result = scanner.next()){
				showResult(result);
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
		finally{
			close();
		}
	}
	public static void main(String[] args){
		scanColumn("Student","S_Age");
	}
}

在这里插入图片描述

（4）modifyData(String tableName, String row, String column)

修改表 tableName，行 row（可以用学生姓名 S_Name 表示），列 column 指定的单元格的数据。

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;


public class modifyData {
	public static Configuration configuration;
	public static Connection connection;
	public static Admin admin;
	public static void init(){//建立连接
		configuration = HBaseConfiguration.create();
		configuration.set("hbase.rootdir","hdfs://localhost:9000/hbase");
		try{
			connection = ConnectionFactory.createConnection(configuration);
			admin = connection.getAdmin();
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void close(){//关闭连接
		try{
			if(admin != null){
				admin.close();
			}
			if(connection != null){
				connection.close();
			}
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void modifyData(String tableName,String row,String column,String value) throws IOException{
		init();
		Table table = connection.getTable(TableName.valueOf(tableName));
		Put put = new Put(row.getBytes());
		String[] cols = column.split(":");
		if(cols.length == 1){
			put.addColumn(column.getBytes(),"".getBytes(), value.getBytes());
		}
		else{
			put.addColumn(cols[0].getBytes(), cols[1].getBytes(), value.getBytes());
		}
		table.put(put);
		close();
	}
	public static void main(String[] args){
		try{
			modifyData("Student","1","S_Name","Tom");
		}
		catch(Exception e){
			e.printStackTrace();
		}
	}
}

在这里插入图片描述

（5）deleteRow(String tableName, String row)

删除表 tableName 中 row 指定的行的记录。

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Table;


public class deleteRow {
	public static Configuration configuration;
	public static Connection connection;
	public static Admin admin;
	public static void init(){//建立连接
		configuration = HBaseConfiguration.create();
		configuration.set("hbase.rootdir","hdfs://localhost:9000/hbase");
		try{
			connection = ConnectionFactory.createConnection(configuration);
			admin = connection.getAdmin();
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void close(){//关闭连接
		try{
			if(admin != null){
				admin.close();
			}
			if(connection != null){
				connection.close();
			}
		}catch(IOException e){
			e.printStackTrace();
		}
	}
	public static void deleteRow(String tableName,String row) throws IOException{
		init();
		Table table = connection.getTable(TableName.valueOf(tableName));
		Delete delete = new Delete(row.getBytes());
		table.delete(delete);
		close();
	}
	public static void main(String[] args){
		try{
			deleteRow("Student","3");
		}catch(Exception e){
			e.printStackTrace();
		}
	}
}

在这里插入图片描述

3. 利用 HBase 和 MapReduce 完成如下任务：

假设 HBase 有 2 张表，表的逻辑视图及部分数据如下所示：

书名（bookName）	价格(price)
Database System Concept	30$
Thinking in Java	60$
Data Mining	25$

创建表格
create "book","bookName
添加数据
put "book","30$","bookName:","Database System Concept"
put "book","60$","bookName:","Thinking in Java "
put "book","25$","bookName:","Data Mining "
查看表格
scan "book"//表格默认根据keyRows按字典顺序排序

在这里插入图片描述