ARP fun with UmTRX

Ok,

I added a USB 2.0 to 10/100/1000 Gigabit Ethernet LAN Wired Network Adapter based on ASIX AX88178 Chipset to my OpenBTS Server, configured it as eth0 with static IP address 192.168.1.9 and configured UmTRX with IP address 192.168.1.10.

Image

Image

To force the new NIC to be used (since the old NIC is on the same 192.168.1.0/24 subnet) I needed to add a static route: route add -host 192.168.1.10 dev eth0

Image

After doing that guess what – the OpenBTS server and UmTRX can no longer talk (ping fails)! Looks like an ARP problem …

ARP is used to locate the Ethernet address associated with a desired IP address. When a machine has a packet bound for another IP on a locally connected Ethernet network, it will send a broadcast Ethernet frame containing an ARP request onto the Ethernet. All machines with the same Ethernet broadcast address will receive this packet. If a machine (such as UmTRX) receives the ARP request and it hosts the IP requested, it will respond with the link layer address on which it will receive packets for that IP address. Once the requestor receives the response packet, it associates the MAC address and the IP address. This information is stored in the ARP cache. In simplest terms, the ARP cache is a stored mapping of IP addresses with link layer addresses. An ARP cache obviates the need for an ARP request/reply conversation for each IP packet exchanged.

Running “arp -n” shows that the ARP cache entry state for 10.168.1.10 is “incomplete” which means we did not get an ARP reply to the broadcast ARP “Who is …”:

Image

Image

However through the existing eth1 with static IP address 192.168.1.11 all is OK:

Image

Image

Strangely when the UmTRX is power cycled I do see Gratuitous ARP reply frames on both eth0 and eth1 which are ARP replies to a question never asked. This sort of ARP is common in failover solutions and also for nefarious sorts of purposes:

Image

Adding a static ARP entry did not seem to help at all:

arp -i eth0 192.168.1.10 00:1f:11:02:19:0d

When a Linux box is connected to a network segment with multiple network cards, a potential problem with the link layer address to IP address mapping can occur. The Linux server may respond to ARP requests from both Ethernet interfaces. On the machine creating the ARP request, these multiple answers can cause confusion, or worse yet, non-deterministic population of the ARP cache. Known as ARP flux and can lead to the possibly puzzling effect that an IP migrates non-deterministically through multiple link layer addresses.
In general, the arp_filter solution sufficiently solves the ARP flux problem. First, hosts do not generate ARP requests for networks to which they do not have a direct route and second, when such a route exists, the host normally chooses a source address in the same network as the destination.

sysctl -w net.ipv4.conf.default.arp_filter=1
sysctl -w net.ipv4.conf.all.arp_filter=1
sysctl -w net.ipv4.conf.eth0.arp_filter=1
sysctl -w net.ipv4.conf.eth1.arp_filter=1
sysctl -p

Made no difference! At this point I’m running out of steam so back to basics ….

Changed the Gigabit switch – no difference.
Took eth1 (original NIC) down – no difference.

Is this an issue with the ASIX AX88178 Chipset or Driver or possibly the UmTRX. This is more likely to be an UmTRX issue since it works on eth1 but not on eth0! We know that the ARP “Who is …” gets sent. Does the UmTRX receive it? Does the UmTRX discard it? Does the UmTRX send and “is at” reply which is not received due to a framing error ….

Clear the ARP cache:

arp -d 192.168.1.10

eth0 (not OK):

Image

Image

eth1 (OK):

Image

Image

So nothing much different about the request frame … I’m confused ……

Check the source code from https://github.com/fairwaves/UHD-Fairwaves/blob/fairwaves/umtrx/firmware/zpu/lib/net_common.c

  • ARPHRD_ETHER 1
  • ETHERTYPE_IPV4 0x0800
  • ETHERTYPE_ARP 0x0806

Image

Seems to be nothing wrong here. More testing …

eth0 statistics before test and after test:

Image

After a bit eth0 seems to stop sending any packets regardless of what Wireshark says! Hence probably a dongle issue caused by interrupt conflicts!

/var/log/messages says:

Image

Image

Disable ACPI:
chkconfig –level 2345 acpid off

You can also disable ACPI completely by appending acpi=off to the kernel boot command line in the grub.conf file. This seems to kill USB keyboard and mouse!!!!

I give up – time to build a new OpenBTS using proper hardware and running Ubuntu!

 

Share