微信配网的内核异常



  • 在用微信配网时有时出现内核异常,终端信息如下:
    //////////////////////////////////////////////////////////////////////////////
    /etc/config# aac
    [ 2006.360000] init queue success
    [ 2009.470000] Reserved instruction in kernel code[#3]:
    [ 2009.470000] CPU: 0 PID: 2375 Comm: iwpriv Tainted: G D 3.18.29 #12
    [ 2009.470000] task: 879321e8 ti: 8784a000 task.ti: 8784a000
    [ 2009.470000] $ 0 : 00000000 00000001 00000000 00000000
    [ 2009.470000] $ 4 : 775ea000 00000000 00000000 00000001
    [ 2009.470000] $ 8 : 00000014 80008f3c 00000200 00000000
    [ 2009.470000] $12 : 00000000 fffffffc 00000000 00000000
    [ 2009.470000] $16 : 775d2000 87390900 87375200 00000000
    [ 2009.470000] $20 : 87385960 873756e0 87390800 87375500
    [ 2009.470000] $24 : 00000000 8008d84c
    [ 2009.470000] $28 : 8784a000 8784be28 8784be28 800dce10
    [ 2009.470000] Hi : 00000000
    [ 2009.470000] Lo : 00000010
    [ 2009.470000] epc : 800dc044 padzero+0x6c/0x70
    [ 2009.470000] Tainted: G D
    [ 2009.470000] ra : 800dce10 load_elf_binary+0x9ec/0x11a8
    [ 2009.470000] Status: 1100e403 KERNEL EXL IE
    [ 2009.470000] Cause : 10800028
    [ 2009.470000] PrId : 00019655 (MIPS 24KEc)
    [ 2009.470000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_reject_ipv4 nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt mt_wifi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables ipv6 leds_gpio ohci_platform ohci_hcd ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
    [ 2009.470000] Process iwpriv (pid: 2375, threadinfo=8784a000, task=879321e8, tls=771e4440)
    [ 2009.470000] Stack : 00000000 87afdf20 7fff7f41 00000017 00002012 00000006 00000010 00000001
    00000000 00000011 775e912a 775e91e0 87375600 00016ffc 00000001 00000007
    00400000 00400000 0040ba90 0041ba90 0041bbe7 87375200 87375200 fffffff8
    80320be8 00000001 80320000 803208c0 fffffff8 802c1608 0080a44c 800a56f4
    810d9480 879db900 86ca4000 00000f2d 87375200 87375200 879321e8 00000947
    ...
    [ 2009.470000] Call Trace:
    [ 2009.470000] [<800dc044>] padzero+0x6c/0x70
    [ 2009.470000]
    [ 2009.470000]
    Code: 8fbf0004 27bd0008 03e00008 <00000000> 27bdffe8 afbf0014 afb00010 00808021 0c04e4a4
    [ 2009.690000] ---[ end trace c3643a3c4c1c62e8 ]---
    Segmentation fault
    [ 2043.620000] Reserved instruction in kernel code[#4]:
    [ 2043.620000] CPU: 0 PID: 0 Comm: swapper Tainted: G D 3.18.29 #12
    [ 2043.620000] task: 80312ad0 ti: 8030c000 task.ti: 8030c000
    [ 2043.620000] $ 0 : 00000000 00000000 000000b0 00000000
    [ 2043.620000] $ 4 : c039d4cb 87baa84a 00000006 000000b0
    [ 2043.620000] $ 8 : 00060000 000001f8 000001f8 00000000
    [ 2043.620000] $12 : 00000000 77a783a0 00000000 00000000
    [ 2043.620000] $16 : 87baa84a c039d4a8 c029e000 87baa840
    [ 2043.620000] $20 : 8030dcd4 87baa84a 87b036c0 872c0000
    [ 2043.620000] $24 : 00000018 8001ff4c
    [ 2043.620000] $28 : 8030c000 8030dc50 872c0000 8723bd38
    [ 2043.620000] Hi : 0000001f
    [ 2043.620000] Lo : 00000000
    [ 2043.620000] epc : 8013cf80 memcmp+0x1c/0x30
    [ 2043.620000] Tainted: G D
    [ 2043.620000] ra : 8723bd38 MacTableLookup+0x7c/0xa0 [mt_wifi]
    [ 2043.620000] Status: 1100e403 KERNEL EXL IE
    [ 2043.620000] Cause : 10800028
    [ 2043.620000] PrId : 00019655 (MIPS 24KEc)
    [ 2043.620000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT slhc nf_reject_ipv4 nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt mt_wifi ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables ipv6 leds_gpio ohci_platform ohci_hcd ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common
    [ 2043.620000] Process swapper (pid: 0, threadinfo=8030c000, task=80312ad0, tls=00000000)
    [ 2043.620000] Stack : 87b036c0 a7154240 c029e000 87217770 8736d840 8030dcc8 c03de000 8723a4c8
    8030dc7c 8030dc78 06a31040 86a31040 00000000 0000000f c029e000 c029e000
    c03de000 87b036c0 87baa840 8030dcd4 00000101 10020180 872c0000 8723afac
    d1d71780 8005c4a4 8030dd2c 86bfdeb0 00000000 803119b0 07baa800 c1af0000
    00000000 000000c2 8030dcd4 00000000 87baa800 87baa840 87b036c0 87baa840
    ...
    [ 2043.620000] Call Trace:
    [ 2043.620000] [<8013cf80>] memcmp+0x1c/0x30
    [ 2043.620000] [<8723bd38>] MacTableLookup+0x7c/0xa0 [mt_wifi]
    [ 2043.620000] [<8723a4c8>] dev_rx_data_frm+0x88/0x698 [mt_wifi]
    [ 2043.620000] [<8723afac>] rtmp_rx_done_handle+0x4d4/0x4f8 [mt_wifi]
    [ 2043.620000] [<87272da8>] mt_mac_int_4_tasklet+0xfcc/0x10ac [mt_wifi]
    [ 2043.620000]
    [ 2043.620000]
    Code: 90470000 00a31021 90420000 <00e21023> 1040fff8 24630001 03e00008 00000000 27bdfff8
    [ 2043.870000] ---[ end trace c3643a3c4c1c62e9 ]---
    [ 2043.880000] Kernel panic - not syncing: Fatal exception in interrupt
    [ 2043.880000] Rebooting in 3 seconds..

    Widora by mango,V1.0.8

    Board: Ralink APSoC DRAM: 64 MB
    relocate_code Pointer at: 83fb4000


    Software System Reset Occurred


    flash manufacture id: ef, device id 40 19
    find flash: W25Q256FV



  • 基本在第二次执行 aac 后会出现


  • administrators

    今晚测试一下



  • 现在发现 不用 aac也会异常,只要启用wifi就会这样!

    时不时跑着跑着内核异常了!


  • administrators

    是用我们wiki的固件?哪个版本?



  • 用的BIT5的板,
    连个板
    一个板没更新刷固件,这个原板固件现还没发现此异常;
    另一个板烧了自己编译出的固件,发现此现象!

    代码包的版本怎么看?


  • administrators

    代码包是git来看,初步怀疑你编译时候是否缺少了一些依赖。
    如果自己搭建的环境有问题,可以用我们提供的虚拟机: https://cn.widora.io/compile



  • 好吧,我再试试看你们的虚拟机开发环境!



  • 我把虚拟机里原有的映像openwrt-ramips-mt7688-WIDORA3264-squashfs-sysupgrade.bin 烧入板子,也出类同的崩溃异常!


  • administrators

    @jansin_shaw 把你用自己环境,用虚拟机环境的分别上传到论坛,我看看



  • 是指上传 映像文件吗?


  • administrators



  • test.bin openwrt-ramips-mt7688-WIDORA3264-squashfs-sysupgrade.bin

    test.bin是非虚拟机环境自己编译的
    另一个是你们网上虚拟机中工程环境原有的


  • administrators

    晚上我测试反馈到这里



  • 好!

    原始映像运行后只做了下修改配置以太网从静态改为dhcp



  • test.bin

    重新上传这个test.bin映像
    此映像的root密码为:root



  • ?????


  • administrators

    在BIT3和BIT5上测试了你先后上传的两个固件,均没有发现崩溃



  • 可我手上这个板子运行确实是时不时就崩溃了,
    本来我们是要用再新项目产品上的,现遇到这情况很难堪,
    如不能稳定运行 很难继续下去,希望能帮深入分析原因,
    也可以把板子寄给你们看看!



  • puttywidora.log

    这个是超级终端记录,放着没人为操作的时候!


  • administrators

    你是不是刷成了128MB的uboot? 如果没有,那我来检查板子看看



  • UBOOT一直是原板内带的,我们没更新过uboot,只做过系统映像更新!


  • administrators

    @jansin_shaw OK,已发邮件,我来检查


  • administrators

    收到板子,默认板子带了固件root@Rise,做一些记录如下:

    /etc/config/network

    config interface 'lan'
            option ifname 'eth0.1'
            option force_link '1'
            option macaddr '0c:ef:af:d1:dc:07'
            option type 'bridge'
            option proto 'dhcp'
            option ipaddr '192.168.8.18'
            option netmask '255.255.255.0'
            option ip6assign '60'
    
    config interface 'wan'
            option ifname 'eth0.2'
            option force_link '1'
            option macaddr '0c:ef:af:d1:dc:06'
            option proto 'dhcp'
    
    config interface 'wan6'
            option ifname 'eth0.2'
            option proto 'dhcpv6'
    
    config switch
            option name 'switch0'
            option reset '1'
            option enable_vlan '1'
    
    config switch_vlan
            option device 'switch0'
            option vlan '1'
            option ports '1 2 3 4 6t'
    
    config switch_vlan
            option device 'switch0'
            option vlan '2'
            option ports '0 6t'
    
    config interface 'wwan'
            option proto 'dhcp'
            option ifname 'apcli0'
    

    /etc/config/wireless

    config wifi-device 'radio0'
            option type 'ralink'
            option variant 'mt7628'
            option country 'CN'
            option hwmode '11bgn'
            option htmode 'HT40'
            option channel 'auto'
    
    config wifi-iface 'ap'
            option device 'radio0'
            option mode 'ap'
            option network 'lan'
            option ifname 'ra0'
            option ssid 'RSCTn_D1DC06'
            option encryption 'psk2'
            option key '123456789'
            option hidden '0'
    
    config wifi-iface 'sta'
            option device 'radio0'
            option mode 'sta'
            option network 'wwan'
            option ifname 'apcli0'
            option ssid 'sssssssss'
            option key 'sstststsstst'
    

    等待连接上AP后,运行aac命令

    root@Rise:/etc/config# aac
    [  531.920000] [MSC] enter monitor mode: filter:0x0, chan_id:1, width:2, chan_flags:0x0, mon0
    [  531.940000] init queue success
    [  532.690000] ApCliIfMonitor: IF(apcli0) - no Beacon is received from Root-AP.
    [  532.690000] APCLI LINK DOWN - IF(apcli0)
    [  532.700000] WLAN:STA d4:5f:25:fd:07:34(dev:ra0 rate:135Mbps singnal:-34dBm) disconnect 
    [  537.690000] AP-Client probe response: SSID=ziroom304, BSSID=d4:5f:25:fd:07:34
    [  537.990000] APCLI LINK UP - IF(apcli0) AuthMode(7)=WPA2PSK, WepStatus(6)=AES!
    [  538.100000] WLAN:STA d4:5f:25:fd:07:34(dev:ra0 rate:135Mbps singnal:-36dBm) disconnect 
    [  538.330000] APCLI LINK DOWN - IF(apcli0)
    [  556.090000] AP-Client probe response: SSID=ziroom304, BSSID=d4:5f:25:fd:07:34
    [  556.100000] APCLI LINK UP - IF(apcli0) AuthMode(7)=WPA2PSK, WepStatus(6)=AES!
    [  570.950000] BUG: Bad page state in process sh  pfn:028a6
    [  570.950000] page:810514c0 count:8560 mapcount:0 mapping:  (null) index:0x0
    [  570.960000] flags: 0x0()
    [  570.960000] page dumped because: nonzero _count
    [  570.970000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 n
    [  571.030000] CPU: 0 PID: 3404 Comm: sh Tainted: G    B          3.18.29 #23
    [  571.040000] Stack : 00000000 00000000 00000000 00000000 803541f2 0000003e 00000000 0000000
              00000001 8108b4b0 802b7694 803129e3 00000d4c 80353420 8297ede0 8108b4b0
              00020200 00467000 810514d4 800476c0 00000003 80024410 802be600 8108b4b0
              802bab98 828e5b24 00000000 00000000 00000000 00000000 00000000 00000000
              00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
              ...
    [  571.080000] Call Trace:
    [  571.080000] [<80014240>] show_stack+0x50/0x84
    [  571.080000] [<8006f6c8>] bad_page+0xe8/0x118
    [  571.090000] [<80071cb4>] get_page_from_freelist+0x41c/0x5c0
    [  571.090000] [<80071f60>] __alloc_pages_nodemask+0x108/0x68c
    [  571.100000] [<800724fc>] __get_free_pages+0x18/0x4c
    [  571.110000] [<800886c4>] __tlb_remove_page+0x64/0xbc
    [  571.110000] [<800896dc>] unmap_single_vma+0x4b8/0x710
    [  571.120000] [<8008aa74>] unmap_vmas+0x54/0x74
    [  571.120000] [<8008f870>] exit_mmap+0x70/0x16c
    [  571.120000] [<800223b4>] mmput+0x3c/0xd4
    [  571.130000] [<800a6188>] flush_old_exec+0x4b8/0x5ec
    [  571.130000] [<800dc734>] load_elf_binary+0x310/0x11a8
    [  571.140000] [<800a56f4>] search_binary_handler+0x88/0x1c8
    [  571.140000] [<800a69b4>] do_execve+0x32c/0x4c0
    [  571.150000] [<80006b5c>] handle_sys+0x11c/0x140
    [  571.150000] 
    ^C^C[  577.380000] [MSC] leave monitor mode.
    [  577.400000] deinit queue success
    

    目前情况,在连接上级AP后进行aac,很容易崩溃。

    接下来清除所有配置,不连接上级AP,进行aac测试:(仍然出现错误!)

    [  293.680000] |--------------------------------------------------------|
    [  294.320000] BUG: Bad page state in process ralink.sh  pfn:02f96
    [  294.330000] page:8105f2c0 count:37940 mapcount:-65535 mapping:  (null) index:0xffff
    [  294.330000] flags: 0x0()
    [  294.340000] page dumped because: nonzero _count
    [  294.340000] Modules linked in: pppoe ppp_async iptable_nat pppox ppp_generic nf_nat_ipv4 n
    [  294.410000] CPU: 0 PID: 2001 Comm: ralink.sh Not tainted 3.18.29 #23
    [  294.410000] Stack : 00000000 00000000 00000000 00000000 803541f2 00000038 00000000 0000000
              00000001 8108b4b0 802b7694 803129e3 000007d1 80353420 82cf4720 8108b4b0
              000204d0 00989000 8105f2d4 800476c0 00000003 80024484 802be600 8108b4b0
              802bab98 82dcbc6c 00000000 00000000 00000000 00000000 00000000 00000000
              00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
              ...
    [  294.450000] Call Trace:
    [  294.450000] [<80014240>] show_stack+0x50/0x84
    [  294.460000] [<8006f6c8>] bad_page+0xe8/0x118
    [  294.460000] [<80071cb4>] get_page_from_freelist+0x41c/0x5c0
    [  294.470000] [<80071f60>] __alloc_pages_nodemask+0x108/0x68c
    [  294.470000] [<80088afc>] __pte_alloc+0x34/0x184
    [  294.480000] [<8008a620>] copy_page_range+0x108/0x508
    [  294.480000] [<80023034>] copy_process.part.77+0x9ac/0x111c
    [  294.490000] [<8002387c>] do_fork+0xc0/0x2c0
    [  294.490000] [<80006b5c>] handle_sys+0x11c/0x140
    [  294.500000] 
    [  294.500000] Disabling lock debugging due to kernel taint
    [  294.510000] CPU 0 Unable to handle kernel paging request at virtual address 00100104, epc0
    [  294.510000] Oops[#1]:
    [  294.510000] CPU: 0 PID: 2059 Comm: dhcpv6.script Tainted: G    B          3.18.29 #23
    [  294.510000] task: 82cf4000 ti: 82d56000 task.ti: 82d56000
    [  294.510000] $ 0   : 00000000 0041d21c 8108b4bc 00100100
    [  294.510000] $ 4   : 00000000 8108b4b8 00000134 00000024
    [  294.510000] $ 8   : 00000002 00000000 00000000 80320be8
    [  294.510000] $12   : 00000001 77b5c068 00000000 000033e5
    [  294.510000] $16   : 80312100 00000141 00000000 00000001
    [  294.510000] $20   : 81059da0 8108b4b0 000200d0 0045da30
    [  294.510000] $24   : 00000000 77abb8f0                  
    

    初步怀疑可能固件有问题,刷

    Ver:0.1.8-20180813 openwrt-ramips-mt7688-WIDORA3264-squashfs-sysupgrade.bin 后进行测试,默认不配置情况下连续五次aac,未出现问题。

    随后配置上sta,再进行aac,重试5次,未崩溃,但有一次一直获取不到ssid和key。

    刷wiki中的固件Ver:0.1.8-20180430 by WIDORA

    默认在无配置sta,和配置sta之后,aac各进行5次,没出现崩溃。

    将wiki固件的network配置也按照root@Rise来改,测试:

    出现了两次崩溃,随后又死活不复现了,还没找到规律


  • administrators

    初步感觉和
    config interface 'lan'
    option ifname 'eth0.1'
    option force_link '1'
    option macaddr '0c:ef:af:d1:dc:07'
    option type 'bstatic'
    option proto 'dhcp'
    中的dhcp有关系,这部分改成dhcp的确是没有任何道理



  • af7bd18a-72ee-4838-8155-fbe0a4c28b8f-图片.png

    我只是把lan改了dhcp

    即使改成了静态ip也会出异常

    不过我觉得即使这些配置有误就引起内核都异常崩溃 是不应该的!

    且之前我发现把WIFI关掉就很长时间没出现报错!


  • administrators

    @jansin_shaw OK,下午我继续测试



  • 有新进展吗?


  • administrators

    @jansin_shaw 还没找到明显规律



  • 解决了吗?


  • administrators

    @jansin_shaw 到目前为止,的确还没找到规律。☹
    暂定认为联发科私有驱动在将底层数据包dump到应用层时,容易不稳定(因为这是我们自己添加进代码实现的这种方式,联发科并没提供这方面的支持)
    有没有考虑过这种方式,就是使用Openwrt18.06配合开源驱动MT76

    1. 以前我测试过开源驱动是可以直接打开monitor接口来空中抓包的
    2. 另外MT76-master的稳定性已经很不错:详见: https://widora.io/topic/533/openwrt18-06-mt76-master-driver
    3. 涉及到修改airkiss抓包部分程序,由以前的iwpriv接口修改为标准的monitor网卡抓包接口。
    4. Openwrt18.06目前在7688上的音频部分我们还没调试通,如果你们不需要音频部分,那其他都差不多,并且有完美的LUCI界面支持。


  • @mangogeek
    音频 我们目前倒是没用到,
    还请在那问题板上跑一下Openwrt18.06,看下运行的情况会不会异常!

    另外:
    我们后来又买了2块同型号BIT5板,前面那系统跑很长时间都没出现同样异常!


  • administrators

    @jansin_shaw
    等等,我一直在测试的都是你们寄回的那个板子?!?



  • @mangogeek

    除了把问题板寄给你实验,我这又多买了几块板实验,前后只在寄给你的那块问题板上发现崩溃异常,其它几块目前没发现同类现象!


  • administrators

    @jansin_shaw 今晚我回去后整理整理思路后对比测试!
    我可能也掉坑里了



  • @mangogeek
    换Openwrt18.06及驱动后,应该还能沿用原存储里面的驱动校准参数吧?


  • administrators

    @jansin_shaw 是的,MT76开源驱动会自动读取factory



  • @mangogeek
    我感觉是某种硬件工作状态刚好撞到了驱动的某个微小bug,
    而那个问题板刚好工作时处于这种触发状态

    还是软件的细微缺陷


  • administrators

    的确,只有那个板子出问题!
    春节回来后我会委托焊接厂把主控换掉试试,让他们协助分析芯片具体有什么异常



  • @mangogeek
    静候结果


Log in to reply