編輯:關於android開發
core dump就是在進程crash時把包括內存在內的現場保留下來,以備故障分析。 但有時候,進程crash了卻沒有輸出core,因為有一些因素會影響輸出還是不輸出core文件。 常見的一個coredump開關是ulimit -c,它限制允許輸出的coredump文件的最大size,如果要輸出的core文件大小超過這個值將不輸出core文件。
ulimit -c的輸出為0,代表關閉core dump輸出。
[root@srdsdevapp69 ~]# ulimit -c0
設置ulimit -c unlimited,將不對core文件大小做限制
[root@srdsdevapp69 ~]# ulimit -c unlimited[root@srdsdevapp69 ~]# ulimit -cunlimited
這樣設置的ulimit值只在當前會話中有效,重開一個終端起進程是不受影響的。
ulimit -c只是眾多影響core輸出因素中的一個,其它因素可以參考man。
$ man core...There are various circumstances in which a core dump file is not produced:* The process does not have permission to write the core file. (By default the core file is called core, and is created in the current working directory. See below for details on naming.) Writing the core file will fail if the directory in which it is to be created is non-writable, or if a file with the same name exists and is not writable or is not a regular file (e.g., it is a directory or a symbolic link).* A (writable, regular) file with the same name as would be used for the core dump already exists, but there is more than one hard link to that file.* The file system where the core dump file would be created is full; or has run out of inodes; or is mounted read-only; or the user has reached their quota for the file system.* The directory in which the core dump file is to be created does not exist.* The RLIMIT_CORE (core file size) or RLIMIT_FSIZE (file size) resource limits for the process are set to zero; see getrlimit(2) and the documentation of the shell’s ulimit command (limit in csh(1)).* The binary being executed by the process does not have read permission enabled.* The process is executing a set-user-ID (set-group-ID) program that is owned by a user (group) other than the real user (group) ID of the process. (However, see the description of the prctl(2) PR_SET_DUMPABLE operation, and the description of the /proc/sys/fs/suid_dumpable file in proc(5).)
其實還漏了一個,進程可以捕獲那些本來會出core的信號,然後自己來處理,比如MySQL就是這麼干的。
RHEL/CentOS下默認開啟abrtd進行故障現場記錄(包括生成coredump)和故障報告
此時abrtd進程是啟動的,
[root@srdsdevapp69 ~]# service abrtd statusabrtd (pid 8711) is running...
core文件的生成位置被重定向到了abrt-hook-ccpp
[root@srdsdevapp69 ~]# cat /proc/sys/kernel/core_pattern|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e
生成以下產生coredump的程序,並執行。
testcoredump.c:
int main(){ return 1/0;}
編譯並執行
$gcc testcoredump.c -o testcoredump$./testcoredump
查看系統日志,中途臨時產生了core文件,但最後又被刪掉了。
$tail -f /var/log/messages...Dec 8 09:54:44 srdsdevapp69 kernel: testcoredump[4028] trap divide error ip:400489 sp:7fff5a54b200 error:0 in testcoredump[400000+1000]Dec 8 09:54:44 srdsdevapp69 abrtd: Directory 'ccpp-2016-12-08-09:54:44-4028' creation detectedDec 8 09:54:44 srdsdevapp69 abrt[4029]: Saved core dump of pid 4028 (/root/testcoredump) to /var/spool/abrt/ccpp-2016-12-08-09:54:44-4028 (184320 bytes)Dec 8 09:54:44 srdsdevapp69 abrtd: Executable '/root/testcoredump' doesn't belong to any packageDec 8 09:54:44 srdsdevapp69 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2016-12-08-09:54:44-4028' exited with 1Dec 8 09:54:44 srdsdevapp69 abrtd: Corrupted or bad directory /var/spool/abrt/ccpp-2016-12-08-09:54:44-4028, deleting
abrtd默認只保留軟件包裡的程序產生的core文件,修改下面的參數可以讓其記錄所有程序的core文件。
$vi /etc/abrt/abrt-action-save-package-data.conf...ProcessUnpackaged = yes
再執行一次測試程序就好生成core文件了
Dec 8 10:04:30 srdsdevapp69 kernel: testcoredump[9189] trap divide error ip:400489 sp:7fff99973b30 error:0 in testcoredump[400000+1000]Dec 8 10:04:30 srdsdevapp69 abrtd: Directory 'ccpp-2016-12-08-10:04:30-9189' creation detectedDec 8 10:04:30 srdsdevapp69 abrt[9190]: Saved core dump of pid 9189 (/root/testcoredump) to /var/spool/abrt/ccpp-2016-12-08-10:04:30-9189 (184320 bytes)Dec 8 10:04:31 srdsdevapp69 kernel: Bridge firewalling registeredDec 8 10:04:44 srdsdevapp69 abrtd: Sending an email...Dec 8 10:04:44 srdsdevapp69 abrtd: Email was sent to: root@localhostDec 8 10:04:44 srdsdevapp69 abrtd: New problem directory /var/spool/abrt/ccpp-2016-12-08-10:04:30-9189, processingDec 8 10:04:44 srdsdevapp69 abrtd: No actions are found for event 'notify'
abrtd可以識別出是重復問題,並能夠去重,這可以防止core文件生成的過多把磁盤用光。
Dec 8 10:18:35 srdsdevapp69 kernel: testcoredump[16598] trap divide error ip:400489 sp:7fff26cc9f50 error:0 in testcoredump[400000+1000]Dec 8 10:18:35 srdsdevapp69 abrtd: Directory 'ccpp-2016-12-08-10:18:35-16598' creation detectedDec 8 10:18:35 srdsdevapp69 abrt[16599]: Saved core dump of pid 16598 (/root/testcoredump) to /var/spool/abrt/ccpp-2016-12-08-10:18:35-16598 (184320 bytes)Dec 8 10:18:45 srdsdevapp69 abrtd: Sending an email...Dec 8 10:18:45 srdsdevapp69 abrtd: Email was sent to: root@localhostDec 8 10:18:45 srdsdevapp69 abrtd: Duplicate: UUIDDec 8 10:18:45 srdsdevapp69 abrtd: DUP_OF_DIR: /var/spool/abrt/ccpp-2016-12-08-10:04:30-9189Dec 8 10:18:45 srdsdevapp69 abrtd: Problem directory is a duplicate of /var/spool/abrt/ccpp-2016-12-08-10:04:30-9189Dec 8 10:18:45 srdsdevapp69 abrtd: Deleting problem directory ccpp-2016-12-08-10:18:35-16598 (dup of ccpp-2016-12-08-10:04:30-9189)Dec 8 10:18:45 srdsdevapp69 abrtd: No actions are found for event 'notify_dup'
abrtd對crash報告的大小(主要是core文件)有限制(參數MaxCrashReportsSize設置),超過了也不會生成core文件,相應的日志如下。
Dec 8 14:10:32 srdsdevapp69 abrt[10548]: Saved core dump of pid 10527 (/usr/local/Percona-Server-5.6.29-rel76.2-Linux.x86_64.ssl101/bin/mysqld) to /var/spool/abrt/ccpp-2016-12-08-14:10:00-10527 (10513362944 bytes)Dec 8 14:10:32 srdsdevapp69 abrtd: Directory 'ccpp-2016-12-08-14:10:00-10527' creation detectedDec 8 14:10:32 srdsdevapp69 abrtd: Size of '/var/spool/abrt' >= 1000 MB, deleting 'ccpp-2016-12-08-14:05:43-8080'Dec 8 14:10:32 srdsdevapp69 abrt[10548]: /var/spool/abrt is 25854515653 bytes (more than 1279MiB), deleting 'ccpp-2016-12-08-14:05:43-8080'Dec 8 14:10:32 srdsdevapp69 abrt[10548]: Lock file '/var/spool/abrt/ccpp-2016-12-08-14:05:43-8080/.lock' is locked by process 7893Dec 8 14:10:32 srdsdevapp69 abrt[10548]: '/var/spool/abrt/ccpp-2016-12-08-14:05:43-8080' does not existDec 8 14:10:41 srdsdevapp69 abrtd: Sending an email...Dec 8 14:10:41 srdsdevapp69 abrtd: Email was sent to: root@localhostDec 8 14:10:41 srdsdevapp69 abrtd: New problem directory /var/spool/abrt/ccpp-2016-12-08-14:10:00-10527, processingDec 8 14:10:41 srdsdevapp69 abrtd: No actions are found for event 'notify'
abrtd是監控/var/spool/abrt/目錄觸發的,做個copy操作也會觸發abrtd。
[root@srdsdevapp69 abrt]# cp -rf ccpp-2016-12-08-10:04:30-9189 ccpp-2016-12-08-10:04:30-91891
下面是產生的系統日志:
Dec 8 10:35:33 srdsdevapp69 abrtd: Directory 'ccpp-2016-12-08-10:04:30-91891' creation detectedDec 8 10:35:33 srdsdevapp69 abrtd: Duplicate: UUIDDec 8 10:35:33 srdsdevapp69 abrtd: DUP_OF_DIR: /var/spool/abrt/ccpp-2016-12-08-10:04:30-9189Dec 8 10:35:33 srdsdevapp69 abrtd: Problem directory is a duplicate of /var/spool/abrt/ccpp-2016-12-08-10:04:30-9189Dec 8 10:35:33 srdsdevapp69 abrtd: Deleting problem directory ccpp-2016-12-08-10:04:30-91891 (dup of ccpp-2016-12-08-10:04:30-9189)Dec 8 10:35:33 srdsdevapp69 abrtd: No actions are found for event 'notify_dup'
如果修改core生成目錄,不使用abrt-hook-ccpp回調程序等於禁用了abrtd
echo "/data/core-%e-%p-%t">/proc/sys/kernel/core_pattern
再發生coredump時/var/log/messages中沒有abrtd相關的記錄
Dec 8 10:30:24 srdsdevapp69 kernel: testcoredump[23050] trap divide error ip:400489 sp:7fff9f01dfb0 error:0 in testcoredump[400000+1000]
此時core文件會被直接生成到/proc/sys/kernel/core_pattern指定的位置
/data/core-testcoredump-23050-1481164224
由於/proc/sys/kernel/core_pattern中未使用abrt-hook-ccpp回調程序,檢查abrt-ccpp服務狀態也會相應的返回服務未啟動。
[root@srdsdevapp69 ~]# service abrt-ccpp status[root@srdsdevapp69 ~]# echo $?3
恢復/proc/sys/kernel/core_pattern之後,abrt-ccpp服務變回正常
[root@srdsdevapp69 ~]# echo "|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e">/proc/sys/kernel/core_pattern[root@srdsdevapp69 ~]# service abrt-ccpp status[root@srdsdevapp69 ~]# echo $?0
如果停止abrtd
/proc/sys/kernel/core_pattern為"|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e"
會在生成當前目錄生成core文件
Dec 8 10:46:21 srdsdevapp69 kernel: testcoredump[31364] trap divide error ip:400489 sp:7fff15d6f450 error:0 in testcoredump[400000+1000]Dec 8 10:46:21 srdsdevapp69 abrt[31365]: abrtd is not running. If it crashed, /proc/sys/kernel/core_pattern contains a stale value, consider resetting it to 'core'Dec 8 10:46:21 srdsdevapp69 abrt[31365]: Saved core dump of pid 31364 to /root/core.31364 (184320 bytes)
MySQL的服務進程mysqld會自己捕獲可能引起crash的信號,默認會輸出調用棧後異常退出不會生成core文件。
2016-12-08 11:14:51 14034 [Note] /usr/local/mysql/bin/mysqld: ready for connections.Version: '5.6.29-76.2-debug-log' socket: '/mysqlrds/data/mysql.sock' port: 3306 Source distribution03:18:43 UTC - mysqld got signal 8 ;This could be because you hit a bug. It is also possible that this binaryor one of the libraries it was linked against is corrupt, improperly built,or misconfigured. This error can also be caused by malfunctioning hardware.We will try our best to scrape up some info that will hopefully helpdiagnose the problem, but since we have already crashed,something is definitely wrong and this may fail.Please help us make Percona Server better by reporting anybugs at http://bugs.percona.com/key_buffer_size=33554432read_buffer_size=2097152max_used_connections=2max_threads=100001thread_count=1connection_count=1It is possible that mysqld could use up tokey_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 307242932 K bytes of memoryHope that's ok; if not, decrease some variables in the equation.Thread pointer: 0x2427ca20Attempting backtrace. You can use the following information to find outwhere mysqld died. If you see no messages after this, something wentterribly wrong...stack_bottom = 7fd53066bca8 thread_stack 0x40000/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x35)[0xaf23c9]/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x42e)[0x74d42a]/lib64/libpthread.so.0[0x3805a0f7e0]/usr/local/mysql/bin/mysqld(_Z19mysql_rename_tablesP3THDP10TABLE_LISTb+0x6c)[0x82fa64]/usr/local/mysql/bin/mysqld(_Z21mysql_execute_commandP3THD+0x2aab)[0x8079e9]/usr/local/mysql/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x588)[0x810ce3]/usr/local/mysql/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xd8b)[0x80228a]/usr/local/mysql/bin/mysqld(_Z10do_commandP3THD+0x3bd)[0x801087]/usr/local/mysql/bin/mysqld(_Z26threadpool_process_requestP3THD+0x71)[0x8ec721]/usr/local/mysql/bin/mysqld[0x8ef363]/usr/local/mysql/bin/mysqld[0x8ef5a0]/usr/local/mysql/bin/mysqld(pfs_spawn_thread+0x159)[0xe14049]/lib64/libpthread.so.0[0x3805a07aa1]/lib64/libc.so.6(clone+0x6d)[0x32286e893d]Trying to get some variables.Some pointers may be invalid and cause the dump to abort.Query (7fd508004d80): is an invalid pointerConnection ID (thread ID): 1Status: NOT_KILLEDYou may download the Percona Server operations manual by visitinghttp://www.percona.com/software/percona-server/. You may find informationin the manual which will help you identify the cause of the crash.
要使其產生core文件必須打開--core-file開關
mysqld --defaults-file=/home/mysql/etc/my.cnf --core-file &
也可以將這個參數加入到my.cnf文件中
core_file
關於core文件的大小有個奇怪的現象,其實際占用的磁盤空間可能遠小於文件大小。
比如下面的core文件,文件大小10GB,但實際占用磁盤只有2GB(1940984 * 512B)。
[root@srdsdevapp69 ccpp-2016-12-08-14:10:00-10527]# stat coredump File: `coredump' Size: 10513362944 Blocks: 1940984 IO Block: 4096 regular fileDevice: fd03h/64771d Inode: 14990 Links: 1Access: (0640/-rw-r-----) Uid: ( 173/ abrt) Gid: ( 512/ mysql)Access: 2016-12-08 14:10:41.886280668 +0800Modify: 2016-12-08 14:10:27.704523443 +0800Change: 2016-12-08 14:10:27.704523443 +0800
這是由於系統在生成core文件時,skip了部分全零的塊,即文件中有hole(用dd的seek可以模擬這個現象)。不管是在/proc/sys/kernel/core_pattern中設置abrt-hook-ccpp程序還是直接設置文件目錄,都是這個現象。這其實是一個不錯的優化,節省了磁盤空間也加快了core文件生成速度。
使用新版Android Studio檢測內存洩露和性能 內存洩露,是Android開發者最頭疼的事。可能一處小小的內存洩露,都可能是毀於千裡之堤的蟻穴。 怎麼才能檢測內
Android自定義ViewGroup打造各種風格的SlidingMenu 看鴻洋大大的QQ5.0側滑菜單的視頻課程,對於側滑的時的動畫效果的實現有了新的認識,似乎打
ceph管理平台Calamari的擴展開發接近大半年沒有寫日志了,也許是自己越來越懶惰吧。但有時候寫寫東西能夠讓自己沉澱,還是回來記錄一下吧。入職大半年了,熟悉了一些相關
Android 5.0 Settings源碼簡要分析 概述: 先聲明:本人工作快兩年了,仍是菜鳥級別的,慚愧啊!以前遇到好多知識點都沒有記錄下來,感覺挺可惜的,現在有機會
The Genymotion Virtual device could
Android開發3:Intent、Bundle的使用和ListView