cd /home/grid/data-integration/ . cd /home/grid/data-integration/ . cd /home/grid/data-integration/ . cd /home/grid/data-integration/ . cd /home/grid/data-integration/ .
打开cmd命令行窗口,转到Pan.bat所在的目录,如d:\data-integration,然后执行文件的命令为:pan /file D:\03works\ZYWSPT\kettle\test.ktr 打开cmd命令行窗口,转到Pan.bat所在的目录,如d:\data-integration,然后执行文件的命令为:kitchen /file D:\03works\ZYWSPT\kettle\.kjb 如: @echo off set panpath=C:\pdi-ce-5.4.0.1-130\data-integration set kpath=D:\03works\ZYWSPT\kettle :\ 下面是windows系统下一个完整的执行kettle程序的 bat 批处理文件的内容 ====================================== e: cd E:\Tools\data-integration kettlelog.log 失败: 我一直想在param里传一个参数之就是当前日期的前一天,一直没有成功: 如下代码: @echo off set panpath=C:\pdi-ce-5.4.0.1-130\data-integration
192.168.56.104安装Pentaho的PDI,安装目录为/root/data-integration。 在192.168.56.101上执行以下命令 scp /home/grid/hadoop/etc/hadoop/hdfs-site.xml root@192.168.56.104:/root/data-integration hadoop-configurations/cdh54/ scp /home/grid/hadoop/etc/hadoop/core-site.xml root@192.168.56.104:/root/data-integration 修改PDI安装目录的属主为grid mv /root/data-integration /home/grid/ chown -R grid:root /home/grid/data-integration 编辑相关配置文件 cd /home/grid/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/cdh54
192.168.56.104安装Pentaho的PDI,安装目录为/home/grid/data-integration。 master和kettle为各自主机的hostname (2)编辑spark-env.sh文件,写如下两行,如图1所示 export HADOOP_CONF_DIR=/home/grid/data-integration grid/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin export HADOOP_CONF_DIR=/home/grid/data-integration 修改PDI的Spark例子 cp /home/grid/data-integration/samples/jobs/Spark\ Submit/Spark\ submit.kjb /home/grid /data-integration/test/Spark\ Submit\ Sample.kjb 在Kettle中打开/home/grid/data-integration/test/Spark\
修改配置文件 我们打开Kettle安装目录下的pentaho-big-data-plugin文件夹 data-integration\plugins\pentaho-big-data-plugin 移动jar包路径 我们进入到cdh15文件夹的lib目下 data-integration\plugins\pentaho-big-data-plugin\hadoop-configurations\ 操作路径为: data-integration\plugins\pentaho-big-data-plugin\hadoop-configurations\cdh514 本步骤中,我们需要替换的配置文件有 为了后续也能正常使用hbase,这里的操作也与上面一致 cd /export/servers/hbase-1.2.0-cdh5.14.0/conf sz hbase-site.xml 然后将文件全部复制到data-integration
③通过 PARALLELISM 参数的值合理设置并行线程数,如:time PENTAHO_DI_JAVA_OPTIONS=-DPARALLELISM=4 sh /home/kettle/data-integration /pan.sh -file=/home/kettle/data-integration/test/dir_ktr1/public.T1.ktr > /home/kettle/data-integration
3,使用unzip命令对这个压缩包进行解压 unzip pdi-ce-7.0.0.0-25.zip -d “/opt/kettle” 4,解压后给予相应文件可执行权限 进入到/opt/kettle/data-integration 授予 *.sh +x权限 即可执行权限 5,执行转换 编写测试转换,执行如下命令即可 /opt/kettle-spoon/data-integration/pan.sh -file=/opt/kettle-spoon /ktr/test/test1.ktr log=test1.log 6,执行job sudo /opt/kettle-spoon/data-integration/kitchen.sh -file=/opt :JAVA_HOME/lib:CLASSPATH export PATH=JAVA_HOME/bin:PATH/opt/kettle-spoon/data-integration/pan.sh -file
/usr/bin/env bash HADOOP_CONF_DIR=/root/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations /cdh61 SPARK_HOME=/root/spark (4)编辑core-site.xml文件 vim /root/data-integration/plugins/pentaho-big-data-plugin 修改PDI自带的Spark例子 cp /root/data-integration/samples/jobs/Spark\ Submit/Spark\ submit.kjb /root/big_data /10 10:12:22 - Spark PI - 20/06/10 10:12:22 INFO conf.HiveConf: Found configuration file file:/root/data-integration
192.168.56.104安装Pentaho的PDI,安装目录为/home/grid/data-integration。 将yarn和spark配置文件拷贝到客户端机器 scp /home/grid/hadoop/etc/hadoop/yarn-site.xml 192.168.56.104:/home/grid/data-integration spark/conf/spark-defaults.conf 192.168.56.104:/home/grid/spark/conf/ 修改PDI的Spark例子 cp /home/grid/data-integration /samples/jobs/Spark\ Submit/Spark\ submit.kjb /home/grid/data-integration/test/Spark\ Submit\ Sample.kjb 在Kettle中打开/home/grid/data-integration/test/Spark\ Submit\ Sample.kjb文件,编辑Spark Submit Sample作业项,填写如图
PARALLELISM 参数控制并发线程数量,避免因线程数过多造成资源抢占:time PENTAHO_DI_JAVA_OPTIONS=-DPARALLELISM=4 sh /home/kettle/data-integration /pan.sh \-file=/home/kettle/data-integration/test/dir_ktr1/public.T1.ktr \> /home/kettle/data-integration
切换到pdi的安装目录: C:\WINDOWS\system32>cd /d E:\pdi-ce-8.2.0.0-342\data-integration 执行Pan.bat命令,会提示支持的相关参数: E:\pdi-ce-8.2.0.0-342\data-integration>Pan.bat 返回参数如下。 logging filename 设置日志文件 -level=logging level 设置日志级别 创建脚本文件:transschdule.bat cd /d E:\pdi-ce-8.2.0.0-342\data-integration Kettle的版本号、build日期 实验步骤 1、切换到kettle文件所在目录 右键点击桌面的spoon图标,打开PDI的安装位置,打开安装目录 E:\pdi-ce-8.2.0.0-342\data-integration
kettle.properties # Kettle Properties #绝对路径,用于初始化kettle环境变量(.kettle/kettle.properties所在路径),指向kettle根目录(例如 D:\data-integration ) kettle.home=D:\\data-integration #绝对路径kettle下plugins文件 kettle.plugin=E:\\zhaxiaodong\\apache-tomcat kettle.script=Html\\js\\libs\\url #日志级别 kettle.loglevel=detail #kettle日志存放路径 kettle.log.file.path=D:\\data-integration \\logs #保存上传文件转换(.ktr)或作业(.kjb)的路径,此功能未调试,暂时停用,待开发 kettle.file.repository=D:\\data-integration\\test 6.需要用到大数据组件的:将data-integration目录下的simple-jndi、system和plugins文件夹拷贝到apache-tomcat-9.0.12\bin目录下 不需要用到大数据组件的
启动方式:解压到本地,mac启动方式 /路径/pdi-ce-9.1.0.0-324/data-integration/spoon.sh ⚠️MySql数据抽取:如果使用MySql数据库下载jar https ://download.csdn.net/download/yangfeixien/13755948 放到 /路径/pdi-ce-9.1.0.0-324/data-integration/lib/ ?
Kettle 已经在172.16.1.105安装好PDI 8.3,安装目录为/root/data-integration。 图2 (2)将上一步得到的Hadoop客户端配置文件复制到Kettle的~/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations 启动spoon /root/data-integration/spoon.sh 3. cp mysql-connector-java-5.1.38-bin.jar /root/data-integration/lib 之后重启Spoon加载新增的驱动程序。 2.
在data-integration\system\karaf\system\org\pentaho\pentaho-requirejs-osgi-manager\9.0.0.0-SNAPSHOT目录下将这段拷贝下来 在data-integration\system\karaf\system\pentaho\pentaho-karaf-features\9.0.0.0-SNAPSHOT目录下的文件中进行粘贴 ?
/pentaho/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations 3、修改kettle连接的cdh版本 /pentaho /data-integration/plugins/pentaho-big-data-plugin active.hadoop.configuration=cdh510 4、从hadoop集群中复制文件到 5、然后我们还需要修改一下对应的权限问题: 目录是在cdh10的shims中 /pentaho/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations
/bin/bash source ~/.bashrc /home/mysql/data-integration/pan.sh -file:/home/mysql/MongoDB_to_MySQL.ktr
第二步:做成bat批处理文件,如下: set panpath=C:\pdi-ce-5.4.0.1-130\data-integration set kpath=D:\03works\ZYWSPT\kettle
2014年推广期的数据如下,并保存在/root/data-integration/campaign_session.csv文件中。 USE dw; TRUNCATE TABLE campaign_session_stg; LOAD DATA INFILE '/root/data-integration/campaign_session.csv 下面是一个不完全推广期(在/root/data-integration/ragged_campaign.csv文件里)的例子,2014年1月、4月、9月、10月、11月和12月没有推广期。 campaign_session = NULL ; COMMIT ; USE dw; TRUNCATE TABLE campaign_session_stg; LOAD DATA INFILE '/root/data-integration
假定/root/data-integration为SQL和Kettle的运行目录。 1. 创建定期装载SQL脚本文件,将清单(四)- 2里的SQL脚本保存为文件 /root/data-integration/dw_regular.sql 2. /bin/bash cd /root/data-integration mysql -uroot -p123456 < dw_regular.sql 清单(四)- 4 使用Kettle方式,创建执行Kettle JOB的shell文件 /root/data-integration/dw_regular_load_kettle.sh,内容如清单(四 /bin/bash cd /root/data-integration .