東坡下載:內(nèi)容最豐富最安全的下載站!

首頁IT技術軟件教程 → hive數(shù)據(jù)怎么導入 hive數(shù)據(jù)導入方法

hive數(shù)據(jù)怎么導入 hive數(shù)據(jù)導入方法

相關文章發(fā)表評論 來源:本站整理時間:2014/12/2 17:29:11字體大小:A-A+

更多

作者:專題點擊:56次評論:0次標簽: hive數(shù)據(jù)導入方法

可以通過多種方式將數(shù)據(jù)導入hive表

1.通過外部表導入

用戶在hive上建external表,建表的同時指定hdfs路徑,在數(shù)據(jù)拷貝到指定hdfs路徑的同時,也同時完成數(shù)據(jù)插入external表。
例如:

編輯文件test.txt
$ cat test.txt 
1       hello
2       world
3       test
4       case
字段之間以'\t'分割

啟動hive:
$ hive

建external表:
hive> CREATE EXTERNAL TABLE MYTEST(num INT, name STRING)
    > COMMENT 'this is a test'
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    > STORED AS TEXTFILE
    > LOCATION '/data/test';
OK
Time taken: 0.714 seconds
hive> show tables;
OK
mytest
partition_test
partition_test_input
test
Time taken: 0.07 seconds
hive> desc mytest ;
OK
num     int
name    string
Time taken: 0.121 seconds|

數(shù)據(jù)拷貝到hdfs:
$ hadoop fs -put test.txt /data/test

查看hive表數(shù)據(jù):
hive> select * from mytest;
OK
1       hello
2       world
3       test
4       case
Time taken: 0.375 seconds
hive> select num from mytest;
Total MapReduce jobs = 1
Launching Job 1 out of 1
......
Total MapReduce CPU Time Spent: 510 msec
OK
1
2
3
4
Time taken: 27.157 seconds

這種方式常常用于當hdfs上有一些歷史數(shù)據(jù),而我們需要在這些數(shù)據(jù)上做一些hive的操作時使用。這種方式避免了數(shù)據(jù)拷貝開銷


2.從本地導入

數(shù)據(jù)不在hdfs上,直接從本地導入hive表

文件/home/work/test.txt內(nèi)容同上

建表:
hive> CREATE TABLE MYTEST2(num INT, name STRING)
    > COMMENT 'this is a test2'                          
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'     
    > STORED AS TEXTFILE;                                
OK
Time taken: 0.077 seconds

導數(shù)據(jù)入表:
hive> LOAD DATA LOCAL INPATH '/home/work/test.txt' INTO TABLE MYTEST2;
Copying data from file:/home/work/test.txt
Copying file: file:/home/work/test.txt
Loading data to table default.mytest2
OK
Time taken: 0.24 seconds

查看數(shù)據(jù):
hive> select * from MYTEST2;
OK
1       hello
2       world
3       test
4       case
Time taken: 0.11 seconds


這種方式導入的本地數(shù)據(jù)可以是一個文件,一個文件夾或者通配符,需要注意的是,如果是文件夾,文件夾內(nèi)不能包含子目錄,同樣,通配符只能通配文件。

3.從hdfs導入

上述test.txt文件已經(jīng)導入/data/test
則可以使用下述命令直接將數(shù)據(jù)導入hive表:
hive> CREATE TABLE MYTEST3(num INT, name STRING)
    > COMMENT "this is a test3"
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    > STORED AS TEXTFILE;
OK
Time taken: 4.735 seconds
hive> LOAD DATA INPATH '/data/test/test.txt' INTO TABLE MYTEST3;
Loading data to table default.mytest3
OK
Time taken: 0.337 seconds
hive> select * from MYTEST3 ;
OK
1       hello
2       world
3       test
4       case
Time taken: 0.227 seconds

4. 從其它表導入數(shù)據(jù):

hive> CREATE EXTERNAL TABLE MYTEST4(num INT) ;
OK
Time taken: 0.091 seconds
hive> FROM MYTEST3 test3
    > INSERT OVERWRITE TABLE MYTEST4
    > select test3.num where name='world';
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201207230024_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207230024_0002
Kill Command = /home/work/hadoop/hadoop-1.0.3/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201207230024_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-07-23 18:59:02,365 Stage-1 map = 0%,  reduce = 0%
2012-07-23 18:59:08,417 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:09,435 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:10,445 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:11,455 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:12,470 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:13,489 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.62 sec
2012-07-23 18:59:14,508 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.62 sec
MapReduce Total cumulative CPU time: 620 msec
Ended Job = job_201207230024_0002
Ended Job = -174856900, job is filtered out (removed at runtime).
Moving data to: hdfs://localhost:9000/tmp/hive-work/hive_2012-07-23_18-58-44_166_189728317691010041/-ext-10000
Loading data to table default.mytest4
Deleted hdfs://localhost:9000/user/hive/warehouse/mytest4
Table default.mytest4 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 2, raw_data_size: 0]
1 Rows loaded to mytest4
MapReduce Jobs Launched: 
Job 0: Map: 1   Accumulative CPU: 0.62 sec   HDFS Read: 242 HDFS Write: 2 SUCESS
Total MapReduce CPU Time Spent: 620 msec
OK
Time taken: 30.663 seconds
hive> select * from mytest4;
OK
2
Time taken: 0.103 seconds


擴展知識

相關評論

閱讀本文后您有什么感想? 已有 人給出評價!

  • 2791 喜歡喜歡
  • 2101 頂
  • 800 難過難過
  • 1219 囧
  • 4049 圍觀圍觀
  • 5602 無聊無聊
熱門評論
最新評論
發(fā)表評論 查看所有評論(0)
昵稱:
表情: 高興 可 汗 我不要 害羞 好 下下下 送花 屎 親親
字數(shù): 0/500 (您的評論需要經(jīng)過審核才能顯示)