我有以下系统:
- Windows主机
- Linux客户与Docker(在Virtual Box中)
我已经在Docker(Ubuntu,Virtual Box)中安装了HDFS。我使用了来自Docker Hub的bde2020 hadoop映像。这是我的docker-compose:
namenode:
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
container_name: namenode
restart: always
ports:
- 9870:9870
- 9000:9000
volumes:
- hadoop_namenode:/hadoop/dfs/name
environment:
- CLUSTER_NAME=test
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.20
datanode:
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
container_name: datanode
restart: always
ports:
- 9864:9864
volumes:
- hadoop_datanode:/hadoop/dfs/data
environment:
SERVICE_PRECONDITION: "namenode:9870"
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.21
resourcemanager:
image: bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8
container_name: resourcemanager
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864"
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.22
nodemanager1:
image: bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8
container_name: nodemanager
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.23
historyserver:
image: bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8
container_name: historyserver
restart: always
environment:
SERVICE_PRECONDITION: "namenode:9000 namenode:9870 datanode:9864 resourcemanager:8088"
volumes:
- hadoop_historyserver:/hadoop/yarn/timeline
env_file:
- ./hadoop.env
networks:
control_net:
ipv4_address: 10.0.1.24
volumes:
hadoop_namenode:
hadoop_datanode:
hadoop_historyserver:
networks:
processing_net:
driver: bridge
ipam:
driver: default
config:
- subnet: 10.0.0.0/24
gateway: 10.0.0.1
我的hdfs-site.xml是:
<configuration>
<property><name>dfs.namenode.datanode.registration.ip-hostname-check</name><value>false</value></property>
<property><name>dfs.webhdfs.enabled</name><value>true</value></property>
<property><name>dfs.permissions.enabled</name><value>false</value></property>
<property><name>dfs.namenode.name.dir</name><value>file:///hadoop/dfs/name</value></property>
<property><name>dfs.namenode.rpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.servicerpc-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.http-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.namenode.https-bind-host</name><value>0.0.0.0</value></property>
<property><name>dfs.client.use.datanode.hostname</name><value>true</value></property>
<property><name>dfs.datanode.use.datanode.hostname</name><value>true</value></property>
</configuration>
如果我在导航器中从Linux(在Virtual Box内)编写:
然后我可以访问Hadoop Web ui。
如果我从Windows(主机系统,Virtual Box外部)在导航器中编写:
http://192.168.56.1:9870,然后我也可以访问(我已将此IP映射为可以从Virtual Box外部进行连接)。
但是,当我在Web ui中导航并想要下载文件时,就会出现问题。然后,导航器说它无法连接到服务器dcfb0bf3b42c,并在地址标签中显示了这样的一行:
http://dcfb0bf3b42c:9864/webhdfs/v1/tmp/datalakes/myJsonTest1/part-00000-0009b521-b474-49e7-be20-40f5e8b3a7b4-c000.json?op=OPEN&namenoderpcaddress=namenode:9000&offset=0
如果我将“ dcfb0bf3b42c”这一部分更改为IP:10.0.1.21(来自Linux)或192.168.56.1(来自Windows),它将正常工作并下载文件。
我需要自动执行此过程,以避免每次都需要手动编写IP,因为我需要使用一个程序来访问HDFS数据(Power BI),并且当它尝试访问数据时由于上述问题而失败
我是Hadoop的新手。我可以通过编辑任何配置文件来解决此问题吗?